Skip to main content

Text Properties

Searching for or reading text on your system under test (SUT) using optical character recognition (OCR) requires use of a text property list. These property lists have a wide variety of properties you can include, depending on the text you are working with and the purpose of your search.

See Working with OCR to learn how to work with the most commonly used text properties in more detail.

This page covers the available OCR properties as they can be addressed in-line with SenseTalk scripting. You can also adjust your OCR properties from several locations in Eggplant Functional:

What is a Text Property List?

A text property list is a description of text on the SUT. Every text property list must contain the Text property, which defines the actual text string you are looking for, and any number of the additional properties described below. Any property that is not included in the property list defaults to your Text Preferences settings.

The following text properties can be used with any text property list:

  • Text: Text string. The text string that you want to find on the SUT. (Required.)
  • TextStyle: Text-style name. A group of predefined text properties. (For more information, see The Find Text Panel.)

OCR Text Properties Reference Table

Use the table below as a reference when working with OCR. It is a complete list of all supported SenseTalk properties for use with OCR. Some properties are available for reading and searching, while some are specific to one or the other. The type of value passed to the property is also included. Click the name of the property for its full description (a full description list is included on this page below the table). For more in-depth information on how to use the most common OCR properties, see Succeeding with OCR.

note

A boolean value is like a toggle switch with two possible values; Yes/No values are accepted as well as On/Off.

PropertySearchingReadingValue
AggressiveTextExtractionYesYesBoolean
CaseSensitiveYesNoBoolean
ContrastYesYesBoolean
ContrastColorYesYesA color (see Color Values in SenseTalk)
ContrastToleranceYesYes0-100 / Default 45
DPIYesYesInteger
EnhanceLocalContrastYesYesBoolean
ExtraWordsYesYesA string or list of words
IgnoreNewLinesYesNoBoolean
IgnoreSpacesYesNoBoolean
IgnoreUnderscoresYesNoBoolean
InvertImageYesYesBoolean
LanguageYesYesA language name as specified in OCR Language Support (case sensitive).
LowResolutionModeYesYesBoolean
MultilineNoYesBoolean
PreferDictionaryWordsYesYesBoolean
PreferredPatternYesYesRegular expression string (see Using Patterns in SenseTalk)
PreferredWordsYesYesA string or list of words
ProhibitedWordsYesYesA string or list of words
SearchRectangleYesNo*A pair of coordinates or captured images defining a rectangle
TextDifferenceYesNoInteger
TextRotationYesYesOne of four predefined values
TrimNoYesBoolean
TrimBorderNoYesInteger
TrimColorNoYesA color (see Color Values in SenseTalk)
TrimToleranceNoYesInteger
TrimWhitespaceNoYesBoolean
ValidCharactersNoYesBoolean
ValidPatternYesYesRegular expression string (see Using Patterns in SenseTalk)
ValidWordsYesYesA string or list of words

* You do not need to set a SearchRectangle property with ReadText because ReadText already takes a rectangle by default. This property can also be set with standard image searches; for more on this, see Image References.

OCR Property Definitions

AggressiveTextExtraction

Type: Boolean. Enable this property if you want OCR to extract as much text from the image as possible.

AggressiveTextExtraction Example

-- AggressiveTextExtraction example
Log ReadText(("TLImage","BRImage"), AggressiveTextExtraction:on)

CaseSensitive

Type: Boolean, Default: off. Whether or not Eggplant Functional considers case in text searches. Enable this property to force text searches to respect case and only find text that matches your text string’s capitalization exactly. This property is for searching for text, not reading text.

CaseSensitive Example

-- caseSensitive example
Put "COUPON13995a" into Coupon
MoveTo (Text: Coupon, CaseSensitive:Yes)

Contrast

Type: Boolean. Whether or not the SUT display is converted to a high contrast two-color image before it is sent to OCR for analysis. If contrast is on, a color referred to as the "contrast color" which can be set using the ContrastColor property is considered the primary color of the SUT display, and all other colors are treated as the secondary color. Text can be found in either color. The Contrast property is available for use with both searching for (finding) text and reading text.

Contrast Example

-- Contrast example
log ReadText(contrast:on, contrastColor:"ffffff", validCharacters: "abcdABCD12345", searchRectangle:("TLImage","BRImage"))
note

If Contrast is on, but the ContrastColor is not defined, the top left pixel of the area being searched is treated as the contrast color.

ContrastColor

A color. Default: The top left corner of the search rectangle is used as the contrast color if Contrast is on and no other color is defined. If Contrast is on, the contrast color is considered the primary color of the SUT display, and all other colors are treated as the secondary color. For instructions on finding the background color, see Determining the Background Color. A number of color value formats are recognized by SenseTalk. For the full list of formats, see Color Values in SenseTalk.

ContrastColor Example

-- ContrastColor example
log ReadText(contrast:on, contrastColor:"ffffff", validCharacters: "abcdABCD12345")

ContrastTolerance

Type: Integer, Default: 45. When Contrast is on, contrastTolerance sets the maximum per-channel color difference that is allowed for a pixel to be seen as the contrast color.

ContrastTolerance Example

-- ContrastTolerance example
Click (Text:"Andrew Young", Contrast:On, ContrastTolerance: 65)

DPI

Integer. Default: 72. The DPI property refers to the DPI (dots per inch) of the SUT display. If you are having problems finding text on the SUT, check the SUT's DPI setting, and adjust the DPI property accordingly. Typical DPI settings include: 72, 144, 300, and 2540.

DPI Example

-- DPI example
Click (text: "Continue", DPI: 2540, Language: English)

EnhanceLocalContrast

Type: Boolean. Default: off. Enable this property if you want OCR to automatically increase the local contrast of the text image being sent to the OCR engine. This property may aid recognition when some or all of the text being read has relatively low contrast, such as blue text on a dark background. When Contrast is turned on, this property has no effect, so it is only useful when Contrast is turned off.

EnhanceLocalContrast Example

-- EnhanceLocalContrast example
Log ReadText(("TLImage","BRImage"), enhanceLocalContrast: On)

ExtraWords

A word or list of words. Set this property to a list of words to supplement the built-in dictionary for the current language. These words will be given preference the same way as other dictionary words.ExtraWords is mutually exclusive with PreferredWords, ValidPattern, PreferredPattern, and ValidWords.

ExtraWords Example

-- ExtraWords example
Log Readtext(("TLImage","BRImage"), Language: English, ExtraWords: "Elizabeth, Andrew, Steven, Katherine, Jacob, Brenda")

IgnoreNewlines

Type: Boolean. When enabled, ignoreNewlines causes OCR text searches to ignore line breaks, so a search will match a string even if it's broken over several lines. This property is only available for text searches (not available with ReadText).

IgnoreNewlines Example

-- IgnoreNewLines example
-- In the case of a long name like this, it's possible that it could wrap to a second line in the interface of an application under test, but the OCR could still read it with IngoreNewlines enabled.
Click (Text:"Constantine Papadopoulos", IgnoreNewlines:On)

IgnoreSpaces

Type: Boolean. The ignoreSpaces property causes OCR text searches to disregard spaces in your text string. For example, the string "My Computer" would match "MyComputer" or "M y C o m p u t e r". The ignoreSpaces property is on by default. This is because the OCR sometimes reads spaces that are not intended, especially in strings that are not discrete words, and in text with unusual letter-spacing.

IgnoreSpaces Example

-- IgnoreSpaces example
A tab called "My Account" is part of the UI of the software you're testing, but can appear with an underscore ("My_Account") or without the space ("MyAccount") in different contexts or on different devices.
Click (Text:"My Account", ignoreSpaces:On, IgnoreUnderscores:On) -- is able to find the Account tab whether it has an underscore, space, or no space.

IgnoreUnderscores

Type: Boolean. The ignoreUnderscores property causes OCR text searches to treat underscores as spaces during searches. For example, the string "My_Computer" would match "My_Computer" or "My Computer". The ignoreUnderscores property is on by default, because the OCR sometimes fails to recognize underscores.

IgnoreUnderscores Example

-- IgnoreUnderscores example
Click (Text:"Account Overview", IgnoreUnderscores:On) -- Will click "Account Overview" in a case where the OCR is mistaking an underlined link as text with an underscore in the space ("Account_Overview").

InvertImage

Type: Boolean, Default: Off. Enable this property for OCR to invert the colors of the text image (like a photo negative) before sending it to the OCR engine for processing.

Language

Language name (case sensitive). The natural language of the text you are searching for. (For a list of supported languages, see OCR Language Support.) OCR uses this as a guide, giving preference to words specified in the dictionary it is using. More than one language can be specified. Eggplant Functional comes with numerous languages by default, and additional languages are available for purchase. If no language is specified OCR will still read text; it just won't have a dictionary to compare its findings to. You can also create a Custom OCR Dictionary.

note

The language names are case-sensitive as defined by the OCR dictionary.

Language Example

-- Language example
-- Clicks the "close" ("Bezárás") button in an application using Hungarian.
Click (Text:"Bezárás", Language:"Hungarian", SearchRectangle:("TLImage","BRImage"))

LowResolutionMode

Type: Boolean, Default: Off. A mode of processing used by the OCR engine to treat the image it receives from Eggplant Functional as low resolution (the image is not actually converted to a lower resolution). This might help OCR recognize smaller characters.

MultiLine

Type: Boolean, Default value: Off. This property only applies when reading text near a point, as opposed to reading text within a rectangle. When MultiLine is on, the ReadText function returns the line of text associated with your point, and any lines of text above and below that point if they appear to belong to the same block of text. When MultiLine is off, the ReadText function only returns the line of text associated with the point.

Multiline Example

-- Multiline example
Log ReadText("ShortTextBlockHeaderImage", MultiLine:On, Contrast:On, ContrastColor:BkgdColor)

PreferDictionaryWords

Type: Boolean. While OCR always prefers words in any dictionary it is provided by the Language property, PreferDictionaryWords takes this a step further and requires OCR to return a dictionary word if possible. It will only return a non-dictionary word—using its best interpretation of each character—if no possible variants are found. This property modifies the OCR dictionary. For more information see Customize the OCR Dictionary. Available for both reading and searching for text.

PreferDictionaryWords Example

-- PreferDictionaryWords example
--"Cattywampus" or some other unlikely word is the text shown within the given searchRectangle, and PreferDictionaryWords forces the correct word to be returned.
Log ReadText(("TLImage","BRImage"), Language:English, PreferDictionaryWords:On)

PreferredPattern

Regular expression string (as defined in SenseTalk Patterns for Use with OCR Properties). When this property is enabled and given a regular expression string, OCR gives preference to text that matches the provided pattern. For information on regular expression characters that can be used with SenseTalk, see Using Patterns in SenseTalk. If you want the OCR to require a pattern match, use ValidPattern.PreferredPatternis mutually exclusive with PreferredWords, ValidPattern, ValidWords, and ExtraWords.

PreferredWords

A word or list of words. Set this property to a list of words to supplement the built-in dictionary for the current language. PreferredWords can be used for either reading or searching for text. This property modifies the OCR dictionary. For more information, see Customize the OCR Dictionary.PreferredWordsis mutually exclusive with ValidWords, ValidPattern, PreferredPattern, and ExtraWords.

PreferredWords Example

-- PreferredWords example
-- A list of customer names is passed in as the value for the PreferredWords property, adding all of those names to the OCR dictionary.
Log ReadText(("TLImage","BRImage"), Language: PortugueseBrazilian, PreferredWords:CustomerNameList)
note

In the example above, "PortugueseBrazilian" is not quoted. This unpopulated variable will resolve to its name, so the value passed to the Language property is "PortugueseBrazilian", the same as if this text were in quotes.

ProhibitedWords

A word or list of words. Provide words OCR can recognize that are not what you are looking for to help steer it in the right direction. ProhibitedWords can be used for both reading and searching for text. This property modifies the OCR dictionary. For more information see Customize the OCR Dictionary.

ProhibitedWords Example

-- ProhibitedWords example 
-- Using the ProhibitedWords property to eliminate possible misspellings that the OCR could mistake as being correct.
Click(Text:"Annita",ProhibitedWords:"Amita")

SearchRectangle

Rectangle defined by a coordinate pair (top left corner, bottom right corner). With (0,0) being the top-left corner of the screen, the SearchRectangle property takes a pair of coordinates that define a rectangular area of the SUT screen. Eggplant Functional only searches for the text within this defined rectangle. The SearchRectangle property is for use with searching for text, not reading text. Setting a SearchRectangle with the ReadText() function does not require a special property, as ReadText() takes a rectangle by default. The SearchRectangle property can also be set when searching for Images; for more on this see Image References.

SearchRectangle Example

-- SearchRectangle example
-- Searching for a text string using the SearchRectangle property:
Click (Text:"CharlieBrown", SearchRectangle:("TLImage","BRImage"),contrast:On)
-- Reading text by passing a rectangle directly to the ReadText() function:
Log ReadText(("TLImage","BRImage"),contrast:On)

TextDifference

Type: Integer, Default:0. This property causes text searches to find text that differs from your search by a given number of characters. Only available with OCR searches.

TextDifference Example

-- TextDifference example
-- Would find text written as "armadolli" or any other variation that differs from "armadillo" by one or two characters.
moveTo text:"armadillo", searchRectangle:(305,241,372,274),TextDifference:2

TextRotation

One of four predefined values: Clockwise, Counter-clockwise, Upside-down, or None. When this property is set, OCR identifies words at the degree of rotation specified by one of the predefined values: Clockwise rotates 90 degrees to the right; Counter-clockwise rotates 90 degrees to the left; Upside-down rotates 180 degrees; None does not rotate the text. Can be used for both reading and searching for text.

TextRotation Example

 -- TextRotation example
Log ReadText(("TLImage","BRImage"),TextRotation:"Clockwise")
Click (Text:"Charlie Brown",TextRotation: "Upside-down")

Trim

Type: Boolean, Default: Off. When Trim is on, the OCR engine reduces the size of the rectangle provided to the ReadText() function until a non-background pixel is encountered (usually the edge of the text that you want it to read). The background color is taken from the top left pixel of the rectangle, or from the TrimColor property.

TrimBorder

Type: Integer, Default: 0. When Trim is on, TrimBorder, is the pixel-width of background that is not trimmed from the ReadText() function rectangle. TrimBorder can be set to a negative number, to trim non-background edges from the rectangle.

TrimColor

A color. When Trim is on, TrimColor is the color that is considered the background of the ReadText() function rectangle. If you do not set the TrimColor property, the background color is taken from the top left pixel of the rectangle. SenseTalk recognizes a number of color value formats. For the full list of formats, see Color Values in SenseTalk.

TrimTolerance

Type: Integer, Default: 0. When Trim is on, TrimTolerance is a measure of how much a pixel can differ from the RGB value of the TrimColor and still be considered background.

TrimWhitespace

Type: Boolean, Default: On. When TrimWhitespace is on, all whitespace characters are removed from the beginning and end of returned text. When TrimWhitespace is off, the ReadText function can return text that starts or ends with whitespace characters. Only for use with reading text, not searching for pre-defined strings.

ValidCharacters

Text string. The validCharacters property limits the characters that may be found by the OCR text engine. ValidCharacters can be limited to the characters in the string you are searching for by setting the string to "*". This can be useful if you are trying to "force" a text match from characters that are not being recognized. If OCR determines that characters are present in the defined area but they do not match characters provided in the validCharacters string, it will return "^".

ValidCharacters Example

-- ValidCharacters examples
-- Setting the validCharacters manually:
-- reads a numeric value including currency symbols
Log ReadText (("TLImage","BRImage"), ValidCharacters:"$£€.,0123456789")
--Setting the ValidCharacters to the text being searched for, using an asterisk:
Click (Text:"CoDe13v9065", ValidCharacters:"*", SearchRectangle:("UpperLeftImage","LowerRightImage"))

ValidPattern

Regular expression string (as defined in SenseTalk Patterns for Use with OCR Properties). This property takes a regular expression value and returns only characters or words that match the pattern specified. For information on regular expression characters that can be used with SenseTalk, see Using Patterns in SenseTalk. If you want OCR to prefer a pattern but not require it, see PreferredPattern. ValidPattern is mutually exclusive to PreferredWords, ValidWords, PreferredPattern, and ExtraWords.

ValidPattern Example

-- ValidPattern example
-- Reads the time off of the SUT screen.
Log ReadText(("RT1","RT2"), validPattern:"[0-9][0-9]:[0-9][0-9]")
-- Formats today's date according to the pattern provided to formattedTime()
put formattedTime("[m]/[d]/[year]") into today
-- Clicks the date where found on the SUT screen, opening up a date and time panel. The date format read would be 1/4/2020 with the pattern passed to validPattern in this example.
Click (Text:Today, SearchRectangle:("TL_Date","BR_Date"), validPattern:"[0-9]/[0-9]/[0-9][0-9][0-9][0-9]")

ValidWords

A single word or a string or list of words. Default: Empty (the OCR engine uses the specified Language. Limiting the words that OCR can consider a match allows you to steer the OCR engine toward a successful match, or force the engine to recognize your text string correctly. You can use the asterisk (*) as a wildcard so that the OCR engine looks only for the words in your original text string. This property limits the words that may be found by the OCR text engine; for more see Customize the OCR Engine Dictionary. The validWords property overrides the Language property. This override means that words that are not part of the validWords property are not returned. ValidWords is mutually exclusive toPreferredWords, ValidPattern, PreferredPattern, and ExtraWords.

ValidWords Example

-- ValidWords examples
-- Using ValidWords with a variable for the search text:
Put "Charlie Brown" into mytext
-- Setting validCharacters to "*" to have it match the words being searched for with the Text property.
Click (text: mytext, searchRectangle:("TLImage","BRImage"), validwords:"*")
--Using ValidWords to confirm a language setting in the application under test:
Log ReadText(("TLImage","BRImage"), validWords:"Japanese, English, Spanish, Portuguese, French")

Regular Expression Patterns for Use with OCR Properties

These regular expression patterns are for use with Optical Character Recognition (OCR). For more information about using patterns, please see Using Patterns in SenseTalk.

note

These SenseTalk Pattern signs are for use with validPattern and preferredPattern, and should not be confused with the full SenseTalk Pattern Language. For more information on using SenseTalk patterns outside of OCR, see SenseTalk Pattern Language Basics.

Item NameConventional Regular Expression SignUsage Example/ Explanation
Any Character.c.t - denotes words such as “cat” and “cot”
Character from a character range[][b-d]ell - denotes words such as “bell”, “cell”, “dell”
[ty]ell - denotes words “tell” and “yell”
[A-Z] - denotes any uppercase alpha character
[a-z] - denotes any lowercase alpha character
[A-Я] - denotes any uppercase Cyrillic character
[а-я] - denotes any lowercase Cyrillic character
[0-9] - denotes any numeric character
[0-9a-zA-Z] - denotes any single character, including alpha and numeric characters
Character out of a character range[^][^y]ell - denotes words such as “dell”, “cell”, or “tell”, but not “yell”
Or|c(a|u)t - denotes words “cat” and “cut”
0 or more occurrences in a row*10* - denotes numbers 1, 10, 100, 1000, etc.
1 or more occurrences in a row+10+ - allows numbers 10, 100, 1000, etc. but not 1
[0-9a-zA-Z]+ - allows any word

Notes:

  • Some characters used in regular expressions are used for system purposes. As seen in the table above, these characters include square brackets, periods, etc.
  • If you wish to enter an auxiliary character as a normal one, put a backslash (\) before it. Example: [t-v]x+ denotes words such as "tx", "txx", "txxx", etc., and "ux", "uxx", etc., but \[t-v\]x+ denotes words such as "[t- v]x", "[t-v]xx", "[t-v]xxx" etc.
  • If you need to group certain regular expression elements, use parentheses. For example, (a|b)+|c denotes "c" and any combinations such as "abbbaaabbb", "ababab", etc. (a word of any non-zero length in which there can be any number of a's and b's in any order), while a|b+|c denotes "a", "c", and "b", "bb", "bbb", etc.