Text Properties
Searching for or reading text on your system under test (SUT) using optical character recognition (OCR) requires use of a text property list. These property lists have a wide variety of properties you can include, depending on the text you are working with and the purpose of your search.
See Working with OCR to learn how to work with the most commonly used text properties in more detail.
This page covers the available OCR properties as they can be addressed in-line with SenseTalk scripting. You can also adjust your OCR properties from several locations in Eggplant Functional:
What is a Text Property List?
A text property list is a description of text on the SUT. Every text property list must contain the Text
property, which defines the actual text string you are looking for, and any number of the additional properties described below. Any property that is not included in the property list defaults to your Text Preferences settings.
The following text properties can be used with any text property list:
- Text:
Text string
. The text string that you want to find on the SUT. (Required.) - TextStyle:
Text-style name
. A group of predefined text properties. (For more information, see The Find Text Panel.)
OCR Text Properties Reference Table
Use the table below as a reference when working with OCR. It is a complete list of all supported SenseTalk properties for use with OCR. Some properties are available for reading and searching, while some are specific to one or the other. The type of value passed to the property is also included. Click the name of the property for its full description (a full description list is included on this page below the table). For more in-depth information on how to use the most common OCR properties, see Succeeding with OCR.
A boolean
value is like a toggle switch with two possible values; Yes/No values are accepted as well as On/Off.
Property | Searching | Reading | Value |
---|---|---|---|
AggressiveTextExtraction | Yes | Yes | Boolean |
CaseSensitive | Yes | No | Boolean |
Contrast | Yes | Yes | Boolean |
ContrastColor | Yes | Yes | A color (see Color Values in SenseTalk) |
ContrastTolerance | Yes | Yes | 0-100 / Default 45 |
DPI | Yes | Yes | Integer |
EnhanceLocalContrast | Yes | Yes | Boolean |
ExtraWords | Yes | Yes | A string or list of words |
IgnoreNewLines | Yes | No | Boolean |
IgnoreSpaces | Yes | No | Boolean |
IgnoreUnderscores | Yes | No | Boolean |
InvertImage | Yes | Yes | Boolean |
Language | Yes | Yes | A language name as specified in OCR Language Support (case sensitive). |
LowResolutionMode | Yes | Yes | Boolean |
Multiline | No | Yes | Boolean |
PreferDictionaryWords | Yes | Yes | Boolean |
PreferredPattern | Yes | Yes | Regular expression string (see Using Patterns in SenseTalk) |
PreferredWords | Yes | Yes | A string or list of words |
ProhibitedWords | Yes | Yes | A string or list of words |
SearchRectangle | Yes | No* | A pair of coordinates or captured images defining a rectangle |
TextDifference | Yes | No | Integer |
TextRotation | Yes | Yes | One of four predefined values |
Trim | No | Yes | Boolean |
TrimBorder | No | Yes | Integer |
TrimColor | No | Yes | A color (see Color Values in SenseTalk) |
TrimTolerance | No | Yes | Integer |
TrimWhitespace | No | Yes | Boolean |
ValidCharacters | No | Yes | Boolean |
ValidPattern | Yes | Yes | Regular expression string (see Using Patterns in SenseTalk) |
ValidWords | Yes | Yes | A string or list of words |
* You do not need to set a SearchRectangle
property with ReadText
because ReadText
already takes a rectangle by default. This property can also be set with standard image searches; for more on this, see Image References.
OCR Property Definitions
AggressiveTextExtraction
Type: Boolean
. Enable this property if you want OCR to extract as much text from the image as possible.
AggressiveTextExtraction Example
-- AggressiveTextExtraction example
Log ReadText(("TLImage","BRImage"), AggressiveTextExtraction:on)
CaseSensitive
Type: Boolean, Default: off
. Whether or not Eggplant Functional considers case in text searches. Enable this property to force text searches to respect case and only find text that matches your text string’s capitalization exactly. This property is for searching for text, not reading text.
CaseSensitive Example
-- caseSensitive example
Put "COUPON13995a" into Coupon
MoveTo (Text: Coupon, CaseSensitive:Yes)
Contrast
Type: Boolean
. Whether or not the SUT display is converted to a high contrast two-color image before it is sent to OCR for analysis. If contrast
is on, a color referred to as the "contrast color" which can be set using the ContrastColor
property is considered the primary color of the SUT display, and all other colors are treated as the secondary color. Text can be found in either color. The Contrast
property is available for use with both searching for (finding) text and reading text.
Contrast Example
-- Contrast example
log ReadText(contrast:on, contrastColor:"ffffff", validCharacters: "abcdABCD12345", searchRectangle:("TLImage","BRImage"))
If Contrast
is on, but the ContrastColor
is not defined, the top left pixel of the area being searched is treated as the contrast color.
ContrastColor
A color. Default: The top left corner of the search rectangle is used as the contrast color if Contrast is on and no other color is defined
. If Contrast
is on, the contrast color is considered the primary color of the SUT display, and all other colors are treated as the secondary color. For instructions on finding the background color, see Determining the Background Color. A number of color value formats are recognized by SenseTalk. For the full list of formats, see Color Values in SenseTalk.
ContrastColor Example
-- ContrastColor example
log ReadText(contrast:on, contrastColor:"ffffff", validCharacters: "abcdABCD12345")
ContrastTolerance
Type: Integer, Default: 45
. When Contrast
is on, contrastTolerance
sets the maximum per-channel color difference that is allowed for a pixel to be seen as the contrast color.
ContrastTolerance Example
-- ContrastTolerance example
Click (Text:"Andrew Young", Contrast:On, ContrastTolerance: 65)
DPI
Integer. Default: 72
. The DPI
property refers to the DPI (dots per inch) of the SUT display. If you are having problems finding text on the SUT, check the SUT's DPI setting, and adjust the DPI
property accordingly. Typical DPI settings include: 72, 144, 300, and 2540.
DPI Example
-- DPI example
Click (text: "Continue", DPI: 2540, Language: English)
EnhanceLocalContrast
Type: Boolean. Default: off
. Enable this property if you want OCR to automatically increase the local contrast of the text image being sent to the OCR engine. This property may aid recognition when some or all of the text being read has relatively low contrast, such as blue text on a dark background. When Contrast is turned on, this property has no effect, so it is only useful when Contrast
is turned off.
EnhanceLocalContrast Example
-- EnhanceLocalContrast example
Log ReadText(("TLImage","BRImage"), enhanceLocalContrast: On)
ExtraWords
A word or list of words.
Set this property to a list of words to supplement the built-in dictionary for the current language. These words will be given preference the same way as other dictionary words.ExtraWords
is mutually exclusive with PreferredWords
, ValidPattern
, PreferredPattern
, and ValidWords
.
ExtraWords Example
-- ExtraWords example
Log Readtext(("TLImage","BRImage"), Language: English, ExtraWords: "Elizabeth, Andrew, Steven, Katherine, Jacob, Brenda")
IgnoreNewlines
Type: Boolean
. When enabled, ignoreNewlines
causes OCR text searches to ignore line breaks, so a search will match a string even if it's broken over several lines. This property is only available for text searches (not available with ReadText
).
IgnoreNewlines Example
-- IgnoreNewLines example
-- In the case of a long name like this, it's possible that it could wrap to a second line in the interface of an application under test, but the OCR could still read it with IngoreNewlines enabled.
Click (Text:"Constantine Papadopoulos", IgnoreNewlines:On)
IgnoreSpaces
Type: Boolean
. The ignoreSpaces
property causes OCR text searches to disregard spaces in your text string. For example, the string "My Computer" would match "MyComputer" or "M y C o m p u t e r". The ignoreSpaces
property is on by default. This is because the OCR sometimes reads spaces that are not intended, especially in strings that are not discrete words, and in text with unusual letter-spacing.
IgnoreSpaces Example
-- IgnoreSpaces example
A tab called "My Account" is part of the UI of the software you're testing, but can appear with an underscore ("My_Account") or without the space ("MyAccount") in different contexts or on different devices.
Click (Text:"My Account", ignoreSpaces:On, IgnoreUnderscores:On) -- is able to find the Account tab whether it has an underscore, space, or no space.
IgnoreUnderscores
Type: Boolean
. The ignoreUnderscores
property causes OCR text searches to treat underscores as spaces during searches. For example, the string "My_Computer" would match "My_Computer" or "My Computer". The ignoreUnderscores
property is on by default, because the OCR sometimes fails to recognize underscores.
IgnoreUnderscores Example
-- IgnoreUnderscores example
Click (Text:"Account Overview", IgnoreUnderscores:On) -- Will click "Account Overview" in a case where the OCR is mistaking an underlined link as text with an underscore in the space ("Account_Overview").
InvertImage
Type: Boolean, Default: Off
. Enable this property for OCR to invert the colors of the text image (like a photo negative) before sending it to the OCR engine for processing.
Language
Language name (case sensitive).
The natural language of the text you are searching for. (For a list of supported languages, see OCR Language Support.) OCR uses this as a guide, giving preference to words specified in the dictionary it is using. More than one language can be specified. Eggplant Functional comes with numerous languages by default, and additional languages are available for purchase. If no language is specified OCR will still read text; it just won't have a dictionary to compare its findings to. You can also create a Custom OCR Dictionary.
The language names are case-sensitive as defined by the OCR dictionary.
Language Example
-- Language example
-- Clicks the "close" ("Bezárás") button in an application using Hungarian.
Click (Text:"Bezárás", Language:"Hungarian", SearchRectangle:("TLImage","BRImage"))
LowResolutionMode
Type: Boolean, Default: Off
. A mode of processing used by the OCR engine to treat the image it receives from Eggplant Functional as low resolution (the image is not actually converted to a lower resolution). This might help OCR recognize smaller characters.
MultiLine
Type: Boolean, Default value: Off
. This property only applies when reading text near a point, as opposed to reading text within a rectangle. When MultiLine
is on, the ReadText
function returns the line of text associated with your point, and any lines of text above and below that point if they appear to belong to the same block of text. When MultiLine
is off, the ReadText
function only returns the line of text associated with the point.
Multiline Example
-- Multiline example
Log ReadText("ShortTextBlockHeaderImage", MultiLine:On, Contrast:On, ContrastColor:BkgdColor)
PreferDictionaryWords
Type: Boolean
. While OCR always prefers words in any dictionary it is provided by the Language
property, PreferDictionaryWords
takes this a step further and requires OCR to return a dictionary word if possible. It will only return a non-dictionary word—using its best interpretation of each character—if no possible variants are found. This property modifies the OCR dictionary. For more information see Customize the OCR Dictionary. Available for both reading and searching for text.
PreferDictionaryWords Example
-- PreferDictionaryWords example
--"Cattywampus" or some other unlikely word is the text shown within the given searchRectangle, and PreferDictionaryWords forces the correct word to be returned.
Log ReadText(("TLImage","BRImage"), Language:English, PreferDictionaryWords:On)
PreferredPattern
Regular expression string (as defined in SenseTalk Patterns for Use with OCR Properties).
When this property is enabled and given a regular expression string, OCR gives preference to text that matches the provided pattern. For information on regular expression characters that can be used with SenseTalk, see Using Patterns in SenseTalk. If you want the OCR to require a pattern match, use ValidPattern.PreferredPattern
is mutually exclusive with PreferredWords
, ValidPattern
, ValidWords
, and ExtraWords
.
PreferredWords
A word or list of words.
Set this property to a list of words to supplement the built-in dictionary for the current language. PreferredWords
can be used for either reading or searching for text. This property modifies the OCR dictionary. For more information, see Customize the OCR Dictionary.PreferredWords
is mutually exclusive with ValidWords
, ValidPattern
, PreferredPattern
, and ExtraWords
.
PreferredWords Example
-- PreferredWords example
-- A list of customer names is passed in as the value for the PreferredWords property, adding all of those names to the OCR dictionary.
Log ReadText(("TLImage","BRImage"), Language: PortugueseBrazilian, PreferredWords:CustomerNameList)
In the example above, "PortugueseBrazilian" is not quoted. This unpopulated variable will resolve to its name, so the value passed to the Language
property is "PortugueseBrazilian", the same as if this text were in quotes.
ProhibitedWords
A word or list of words.
Provide words OCR can recognize that are not what you are looking for to help steer it in the right direction. ProhibitedWords
can be used for both reading and searching for text. This property modifies the OCR dictionary. For more information see Customize the OCR Dictionary.
ProhibitedWords Example
-- ProhibitedWords example
-- Using the ProhibitedWords property to eliminate possible misspellings that the OCR could mistake as being correct.
Click(Text:"Annita",ProhibitedWords:"Amita")
SearchRectangle
Rectangle defined by a coordinate pair (top left corner, bottom right corner).
With (0,0) being the top-left corner of the screen, the SearchRectangle
property takes a pair of coordinates that define a rectangular area of the SUT screen. Eggplant Functional only searches for the text within this defined rectangle. The SearchRectangle
property is for use with searching for text, not reading text. Setting a SearchRectangle
with the ReadText()
function does not require a special property, as ReadText()
takes a rectangle by default. The SearchRectangle
property can also be set when searching for Images; for more on this see Image References.
SearchRectangle Example
-- SearchRectangle example
-- Searching for a text string using the SearchRectangle property:
Click (Text:"CharlieBrown", SearchRectangle:("TLImage","BRImage"),contrast:On)
-- Reading text by passing a rectangle directly to the ReadText() function:
Log ReadText(("TLImage","BRImage"),contrast:On)
TextDifference
Type: Integer, Default:0
. This property causes text searches to find text that differs from your search by a given number of characters. Only available with OCR searches.
TextDifference Example
-- TextDifference example
-- Would find text written as "armadolli" or any other variation that differs from "armadillo" by one or two characters.
moveTo text:"armadillo", searchRectangle:(305,241,372,274),TextDifference:2
TextRotation
One of four predefined values: Clockwise, Counter-clockwise, Upside-down, or None.
When this property is set, OCR identifies words at the degree of rotation specified by one of the predefined values: Clockwise
rotates 90 degrees to the right; Counter-clockwise
rotates 90 degrees to the left; Upside-down
rotates 180 degrees; None
does not rotate the text. Can be used for both reading and searching for text.
TextRotation Example
-- TextRotation example
Log ReadText(("TLImage","BRImage"),TextRotation:"Clockwise")
Click (Text:"Charlie Brown",TextRotation: "Upside-down")
Trim
Type: Boolean, Default: Off
. When Trim
is on
, the OCR engine reduces the size of the rectangle provided to the ReadText()
function until a non-background pixel is encountered (usually the edge of the text that you want it to read). The background color is taken from the top left pixel of the rectangle, or from the TrimColor
property.
TrimBorder
Type: Integer, Default: 0.
When Trim
is on, TrimBorder
, is the pixel-width of background that is not trimmed from the ReadText()
function rectangle. TrimBorder
can be set to a negative number, to trim non-background edges from the rectangle.
TrimColor
A color.
When Trim
is on, TrimColor
is the color that is considered the background of the ReadText()
function rectangle. If you do not set the TrimColor
property, the background color is taken from the top left pixel of the rectangle. SenseTalk recognizes a number of color value formats. For the full list of formats, see Color Values in SenseTalk.
TrimTolerance
Type: Integer, Default: 0
. When Trim
is on, TrimTolerance
is a measure of how much a pixel can differ from the RGB value of the TrimColor
and still be considered background.
TrimWhitespace
Type: Boolean, Default: On
. When TrimWhitespace
is on, all whitespace characters are removed from the beginning and end of returned text. When TrimWhitespace
is off, the ReadText
function can return text that starts or ends with whitespace characters. Only for use with reading text, not searching for pre-defined strings.
ValidCharacters
Text string
. The validCharacters
property limits the characters that may be found by the OCR text engine. ValidCharacters
can be limited to the characters in the string you are searching for by setting the string to "*". This can be useful if you are trying to "force" a text match from characters that are not being recognized. If OCR determines that characters are present in the defined area but they do not match characters provided in the validCharacters
string, it will return "^".
ValidCharacters Example
-- ValidCharacters examples
-- Setting the validCharacters manually:
-- reads a numeric value including currency symbols
Log ReadText (("TLImage","BRImage"), ValidCharacters:"$£€.,0123456789")
--Setting the ValidCharacters to the text being searched for, using an asterisk:
Click (Text:"CoDe13v9065", ValidCharacters:"*", SearchRectangle:("UpperLeftImage","LowerRightImage"))
ValidPattern
Regular expression string (as defined in SenseTalk Patterns for Use with OCR Properties).
This property takes a regular expression value and returns only characters or words that match the pattern specified. For information on regular expression characters that can be used with SenseTalk, see Using Patterns in SenseTalk. If you want OCR to prefer a pattern but not require it, see PreferredPattern. ValidPattern
is mutually exclusive to PreferredWords
, ValidWords
, PreferredPattern
, and ExtraWords
.
ValidPattern Example
-- ValidPattern example
-- Reads the time off of the SUT screen.
Log ReadText(("RT1","RT2"), validPattern:"[0-9][0-9]:[0-9][0-9]")
-- Formats today's date according to the pattern provided to formattedTime()
put formattedTime("[m]/[d]/[year]") into today
-- Clicks the date where found on the SUT screen, opening up a date and time panel. The date format read would be 1/4/2020 with the pattern passed to validPattern in this example.
Click (Text:Today, SearchRectangle:("TL_Date","BR_Date"), validPattern:"[0-9]/[0-9]/[0-9][0-9][0-9][0-9]")
ValidWords
A single word or a string or list of words. Default: Empty (the OCR engine uses the specified Language.
Limiting the words that OCR can consider a match allows you to steer the OCR engine toward a successful match, or force the engine to recognize your text string correctly. You can use the asterisk (*) as a wildcard so that the OCR engine looks only for the words in your original text string. This property limits the words that may be found by the OCR text engine; for more see Customize the OCR Engine Dictionary. The validWords
property overrides the Language
property. This override means that words that are not part of the validWords
property are not returned. ValidWords
is mutually exclusive toPreferredWords
, ValidPattern
, PreferredPattern
, and ExtraWords
.
ValidWords Example
-- ValidWords examples
-- Using ValidWords with a variable for the search text:
Put "Charlie Brown" into mytext
-- Setting validCharacters to "*" to have it match the words being searched for with the Text property.
Click (text: mytext, searchRectangle:("TLImage","BRImage"), validwords:"*")
--Using ValidWords to confirm a language setting in the application under test:
Log ReadText(("TLImage","BRImage"), validWords:"Japanese, English, Spanish, Portuguese, French")
Regular Expression Patterns for Use with OCR Properties
These regular expression patterns are for use with Optical Character Recognition (OCR). For more information about using patterns, please see Using Patterns in SenseTalk.
These SenseTalk Pattern signs are for use with validPattern
and preferredPattern
, and should not be confused with the full SenseTalk Pattern Language. For more information on using SenseTalk patterns outside of OCR, see SenseTalk Pattern Language Basics.
Item Name | Conventional Regular Expression Sign | Usage Example/ Explanation |
---|---|---|
Any Character | . | c.t - denotes words such as “cat” and “cot” |
Character from a character range | [] | [b-d]ell - denotes words such as “bell”, “cell”, “dell” [ty]ell - denotes words “tell” and “yell” [A-Z] - denotes any uppercase alpha character [a-z] - denotes any lowercase alpha character [A-Я] - denotes any uppercase Cyrillic character [а-я] - denotes any lowercase Cyrillic character [0-9] - denotes any numeric character [0-9a-zA-Z] - denotes any single character, including alpha and numeric characters |
Character out of a character range | [^] | [^y]ell - denotes words such as “dell”, “cell”, or “tell”, but not “yell” |
Or | | | c(a|u)t - denotes words “cat” and “cut” |
0 or more occurrences in a row | * | 10* - denotes numbers 1, 10, 100, 1000, etc. |
1 or more occurrences in a row | + | 10+ - allows numbers 10, 100, 1000, etc. but not 1 [0-9a-zA-Z]+ - allows any word |
Notes:
- Some characters used in regular expressions are used for system purposes. As seen in the table above, these characters include square brackets, periods, etc.
- If you wish to enter an auxiliary character as a normal one, put a backslash (\) before it. Example:
[t-v]x+
denotes words such as "tx", "txx", "txxx", etc., and "ux", "uxx", etc., but\[t-v\]x+
denotes words such as "[t- v]x", "[t-v]xx", "[t-v]xxx" etc. - If you need to group certain regular expression elements, use parentheses. For example,
(a|b)+|c
denotes "c" and any combinations such as "abbbaaabbb", "ababab", etc. (a word of any non-zero length in which there can be any number of a's and b's in any order), whilea|b+|c
denotes "a", "c", and "b", "bb", "bbb", etc.