Searching for or reading text on your system under test (SUT) using optical character recognition (OCR) requires use of a text property list. These property lists have a wide variety of properties you can include, depending on the text you are working with and the purpose of your search.
See Working with OCR to learn how to work with the most commonly used text properties in more detail.
This page covers the available OCR properties as they can be addressed in-line with SenseTalk scripting. You can also adjust your OCR properties from several locations in Eggplant Functional:
What is a Text Property List?
A text property list is a description of text on the SUT. Every text property list must contain the
Text property, which defines the actual text string you are looking for, and any number of the additional properties described below. Any property that is not included in the property list defaults to your Text Preferences settings.
The following text properties can be used with any text property list:
Text string. The text string that you want to find on the SUT. (Required.)
Text-style name. A group of predefined text properties. (For more information, see The Find Text Panel.)
OCR Text Properties Reference Table
Use the table below as a reference when working with OCR. It is a complete list of all supported SenseTalk properties for use with OCR. Some properties are available for reading and searching, while some are specific to one or the other. The type of value passed to the property is also included. Click the name of the property for its full description (a full description list is included on this page below the table). For more in-depth information on how to use the most common OCR properties, see Succeeding with OCR.
boolean value is like a toggle switch with two possible values; Yes/No values are accepted as well as On/Off.
|ContrastColor||Yes||Yes||A color (see Color Values in SenseTalk)|
|ContrastTolerance||Yes||Yes||0-100 / Default 45|
|ExtraWords||Yes||Yes||A string or list of words|
|Language||Yes||Yes||A language name as specified in OCR Language Support (case sensitive).|
|PreferredPattern||Yes||Yes||Regular expression string (see Using Patterns in SenseTalk)|
|PreferredWords||Yes||Yes||A string or list of words|
|ProhibitedWords||Yes||Yes||A string or list of words|
|SearchRectangle||Yes||No*||A pair of coordinates or captured images defining a rectangle|
|TextRotation||Yes||Yes||One of four predefined values|
|TrimColor||No||Yes||A color (see Color Values in SenseTalk)|
|ValidPattern||Yes||Yes||Regular expression string (see Using Patterns in SenseTalk)|
|ValidWords||Yes||Yes||A string or list of words|
* You do not need to set a
SearchRectangle property with
ReadText already takes a rectangle by default. This property can also be set with standard image searches; for more on this, see Image References.
OCR Property Definitions
Boolean.Enable this property if you want OCR to extract as much text from the image as possible.
Log ReadText(("TLImage","BRImage"), enableaggressivetextextraction:on)
Boolean. Default: off.Whether or not Eggplant Functional considers case in text searches. Enable this property to force text searches to respect case and only find text that matches your text string’s capitalization exactly. This property is for searching for text, not reading text.
Put "COUPON13995a" into Coupon
MoveTo (Text: Coupon, CaseSensitive:Yes)
Boolean.Whether or not the SUT display is converted to a high contrast two-color image before it is sent to OCR for analysis. If
contrastis on, a color referred to as the "contrast color" (which can be set using the
ContrastColorproperty) is considered the primary color of the SUT display, and all other colors are treated as the secondary color. Text can be found in either color. The
Contrastproperty is available for use with both searching for (finding) text and reading text.
log ReadText(contrast:on, contrastColor:"ffffff", validCharacters: "abcdABCD12345", searchRectangle:("TLImage","BRImage"))
If `Contrast` is on, but the `ContrastColor` is not defined, the top left pixel of the area being searched is treated as the contrast color.
A color. Default: The top left corner of the search rectangle is used as the contrast color if Contrast is on and no other color is defined.If
Contrastis on, the contrast color is considered the primary color of the SUT display, and all other colors are treated as the secondary color. For instructions on finding the background color, see Determining the Background Color. A number of color value formats are recognized by SenseTalk. For the full list of formats, see Color Values in SenseTalk.
log ReadText(contrast:on, contrastColor:"ffffff", validCharacters: "abcdABCD12345")
Integer. Default: 45.When
contrastTolerancesets the maximum per-channel color difference that is allowed for a pixel to be seen as the contrast color.
Click (Text:"Andrew Young", Contrast:On, ContrastTolerance: 65)
Integer. Default: 72.The
DPIproperty refers to the DPI (dots per inch) of the SUT display. If you are having problems finding text on the SUT, check the SUT's DPI setting, and adjust the
DPIproperty accordingly. Typical DPI settings include: 72, 144, 300, and 2540.
Click (text: "Continue", DPI: 2540, Language: English)
Boolean. Default: off.Enable this property if you want OCR to automatically increase the local contrast of the text image being sent to the OCR engine. This property may aid recognition when some or all of the text being read has relatively low contrast, such as blue text on a dark background. When Contrast is turned on, this property has no effect, so it is only useful when
Contrastis turned off.
Log ReadText(("TLImage","BRImage"), enhanceLocalContrast: On)
A word or list of words.Set this property to a list of words to supplement the built-in dictionary for the current language. These words will be given preference the same way as other dictionary words.
ExtraWordsis mutually exclusive with
Log Readtext(("TLImage","BRImage"), Language: English, ExtraWords: "Elizabeth, Andrew, Steven, Katherine, Jacob, Brenda")
ignoreNewlinescauses OCR text searches to ignore line breaks, so a search will match a string even if it's broken over several lines. This property is only available for text searches (not available with
Click (Text:"Constantine Papadopoulos", IgnoreNewlines:On) -- In the case of a long name like this, it's possible that it could wrap to a second line in the interface of an application under test, but the OCR could still read it with IngoreNewlines enabled.
ignoreSpacesproperty causes OCR text searches to disregard spaces in your text string. For example, the string "My Computer" would match "MyComputer" or "M y C o m p u t e r". The
ignoreSpacesproperty is on by default. This is because the OCR sometimes reads spaces that are not intended, especially in strings that are not discrete words, and in text with unusual letter-spacing.
A tab called "My Account" is part of the UI of the software you're testing, but can appear with an underscore ("My_Account") or without the space ("MyAccount") in different contexts or on different devices.
Click (Text:"My Account", ignoreSpaces:On, IgnoreUnderscores:On) -- is able to find the Account tab whether it has an underscore, space, or no space.
ignoreUnderscoresproperty causes OCR text searches to treat underscores as spaces during searches. For example, the string "My_Computer" would match "My_Computer" or "My Computer". The
ignoreUnderscoresproperty is on by default, because the OCR sometimes fails to recognize underscores.
Click (Text:"Account Overview", IgnoreUnderscores:On) -- Will click "Account Overview" in a case where the OCR is mistaking an underlined link as text with an underscore in the space ("Account_Overview").
Boolean. Default: Off.Enable this property for OCR to invert the colors of the text image (like a photo negative) before sending it to the OCR engine for processing.
Language name (case sensitive).The natural language of the text you are searching for. (For a list of supported languages, see OCR Language Support.) OCR uses this as a guide, giving preference to words specified in the dictionary it is using. More than one language can be specified. Eggplant Functional comes with numerous languages by default, and additional languages are available for purchase. If no language is specified OCR will still read text; it just won't have a dictionary to compare its findings to. You can also create a Custom OCR Dictionary.
The language names are case-sensitive as defined by the OCR dictionary.
Click (Text:"Bezárás", Language:"Hungarian", SearchRectangle:("TLImage","BRImage")) -- Clicks the "close" ("Bezárás") button in an application using Hungarian.
Boolean. Default: Off.A mode of processing used by the OCR engine to treat the image it receives from Eggplant Functional as low resolution (the image is not actually converted to a lower resolution). This might help OCR recognize smaller characters.
Boolean. Default value: Off.This property only applies when reading text near a point, as opposed to reading text within a rectangle. When
MultiLineis on, the
ReadTextfunction returns the line of text associated with your point, and any lines of text above and below that point if they appear to belong to the same block of text. When
MultiLineis off, the
ReadTextfunction only returns the line of text associated with the point.
Log ReadText("ShortTextBlockHeaderImage", MultiLine:On, Contrast:On, ContrastColor:BkgdColor
Boolean.While OCR always prefers words in any dictionary it is provided by the
PreferDictionaryWordstakes this a step further and requires OCR to return a dictionary word if possible. It will only return a non-dictionary word—using its best interpretation of each character—if no possible variants are found. This property modifies the OCR dictionary. For more information see Customize the OCR Dictionary. Available for both reading and searching for text.
Log ReadText(("TLImage","BRImage"), Language:English, PreferDictionaryWords:On) --"Cattywampus" or some other unlikely word is the text shown within the given searchRectangle, and PreferDictionaryWords forces the correct word to be returned.
Regular expression string (as defined in SenseTalk Patterns for Use with OCR Properties).When this property is enabled and given a regular expression string, OCR gives preference to text that matches the provided pattern. For information on regular expression characters that can be used with SenseTalk, see Using Patterns in SenseTalk. If you want the OCR to require a pattern match, use ValidPattern.
PreferredPatternis mutually exclusive with
A word or list of words.Set this property to a list of words to supplement the built-in dictionary for the current language.
PreferredWordscan be used for either reading or searching for text. This property modifies the OCR dictionary. For more information, see Customize the OCR Dictionary.
PreferredWordsis mutually exclusive with
Log ReadText(("TLImage","BRImage"), Language: PortugueseBrazilian, PreferredWords:CustomerNameList) -- A list of customer names is passed in as the value for the PreferredWords property, adding all of those names to the OCR dictionary.
In the example above, "PortugueseBrazilian" is not quoted. This unpopulated variable will resolve to its name, so the value passed to the `Language` property is "PortugueseBrazilian", the same as if this text were in quotes.
A word or list of words.Provide words OCR can recognize that are not what you are looking for to help steer it in the right direction.
ProhibitedWordscan be used for both reading and searching for text. This property modifies the OCR dictionary. For more information see Customize the OCR Dictionary.
Click(Text:"Annita",ProhibitedWords:"Amita") -- Using the ProhibitedWords property to eliminate possible misspellings that the OCR could mistake as being correct.
Rectangle defined by a coordinate pair (top left corner, bottom right corner).With (0,0) being the top-left corner of the screen, the
SearchRectangleproperty takes a pair of coordinates that define a rectangular area of the SUT screen. Eggplant Functional only searches for the text within this defined rectangle. The
SearchRectangleproperty is for use with searching for text, not reading text. Setting a
ReadText()function does not require a special property, as
ReadText()takes a rectangle by default. The
SearchRectangleproperty can also be set when searching for Images; for more on this see Image References.
Searching for a text string using the SearchRectangle property:
Click (Text:"CharlieBrown", SearchRectangle:("TLImage","BRImage"),contrast:On)
Reading text by passing a rectangle directly to the ReadText() function:
Integer. Default:0.This property causes text searches to find text that differs from your search by a given number of characters. Only available with OCR searches.
moveTo text:"armadillo", searchRectangle:(305,241,372,274),TextDifference:2 -- Would find text written as "armadolli" or any other variation that differs from "armadillo" by one or two characters.
One of four predefined values: Clockwise, Counter-clockwise, Upside-down, or None.When this property is set, OCR identifies words at the degree of rotation specified by one of the predefined values:
Clockwiserotates 90 degrees to the right;
Counter-clockwiserotates 90 degrees to the left;
Upside-downrotates 180 degrees;
Nonedoes not rotate the text. Can be used for both reading and searching for text.
Click (Text:"Charlie Brown",TextRotation: "Upside-down")
Boolean. Default value: Off.When
on, the OCR engine reduces the size of the rectangle provided to the
ReadText()function until a non-background pixel is encountered (usually the edge of the text that you want it to read). The background color is taken from the top left pixel of the rectangle, or from the
Integer. Default value: 0.When
TrimBorder, is the pixel-width of background that is not trimmed from the
TrimBordercan be set to a negative number, to trim non-background edges from the rectangle.
TrimColoris the color that is considered the background of the
ReadText()function rectangle. If you do not set the
TrimColorproperty, the background color is taken from the top left pixel of the rectangle. SenseTalk recognizes a number of color value formats. For the full list of formats, see Color Values in SenseTalk.
Integer. Default value: 0.When
TrimToleranceis a measure of how much a pixel can differ from the RGB value of the
TrimColorand still be considered background.
Boolean. Default: On.When
TrimWhitespaceis on, all whitespace characters are removed from the beginning and end of returned text. When
TrimWhitespaceis off, the
ReadTextfunction can return text that starts or ends with whitespace characters. Only for use with reading text, not searching for pre-defined strings.
Text string. The
validCharactersproperty limits the characters that may be found by the OCR text engine.
ValidCharacterscan be limited to the characters in the string you are searching for by setting the string to "*". This can be useful if you are trying to "force" a text match from characters that are not being recognized. If OCR determines that characters are present in the defined area but they do not match characters provided in the
validCharactersstring, it will return "^".
Setting the validCharacters manually:
Log ReadText[("TLImage","BRImage"], ValidCharacters:"$£€.,0123456789") -- reads a numeric value including currency symbols
Setting the ValidCharacters to the text being searched for, using an asterisk:
Click (Text:"CoDe13v9065", ValidCharacters:"*", SearchRectangle:("UpperLeftImage","LowerRightImage"))
Regular expression string (as defined in SenseTalk Patterns for Use with OCR Properties).This property takes a regular expression value and returns only characters or words that match the pattern specified. For information on regular expression characters that can be used with SenseTalk, see Using Patterns in SenseTalk. If you want OCR to prefer a pattern but not require it, see PreferredPattern.
ValidPatternis mutually exclusive to
Log ReadText(("RT1","RT2"), validPattern:"[0-9][0-9]:[0-9][0-9]") -- Reads the time off of the SUT screen.
put formattedTime("[m]/[d]/[year]") into today -- Formats today's date according to the pattern provided to formattedTime()
Click (Text:Today, SearchRectangle:("TL_Date","BR_Date"), validPattern:"[0-9]/[0-9]/[0-9][0-9][0-9][0-9]") -- Clicks the date where found on the SUT screen, opening up a date and time panel. The date format read would be 1/4/2020 with the pattern passed to validPattern in this example.
A single word or a string or list of words. Default: Empty (the OCR engine uses the specified Language.Limiting the words that OCR can consider a match allows you to steer the OCR engine toward a successful match, or force the engine to recognize your text string correctly. You can use the asterisk (*) as a wildcard so that the OCR engine looks only for the words in your original text string. This property limits the words that may be found by the OCR text engine; for more see Customize the OCR Engine Dictionary. The
validWordsproperty overrides the
Languageproperty. This override means that words that are not part of the
validWordsproperty are not returned.
ValidWordsis mutually exclusive to
Using ValidWords with a variable for the search text:
Put "Charlie Brown" into mytext
Click (text: mytext, searchRectangle:("TLImage","BRImage"), validwords:"*") -- Setting validCharacters to "*" to have it match the words being searched for with the Text property.
Using ValidWords to confirm a language setting in the application under test:
Log ReadText(("TLImage","BRImage"), validWords:"Japanese, English, Spanish, Portuguese, French")
Regular Expression Patterns for Use with OCR Properties
These regular expression patterns are for use with Optical Character Recognition (OCR).
These SenseTalk Pattern signs are for use with
preferredPattern, and should not be confused with the full SenseTalk Pattern Language. For more information on using SenseTalk patterns outside of OCR, see SenseTalk Pattern Language Basics.
|Item Name||Conventional Regular Expression Sign||Usage Example/ Explanation|
|Character from a character range|
|Character out of a character range|
|0 or more occurrences in a row|
|1 or more occurrences in a row|
- Some characters used in regular expressions are used for system purposes. As seen in the table above, these characters include square brackets, periods, etc.
- If you wish to enter an auxiliary character as a normal one, put a backslash (\) before it. Example:
[t-v]x+denotes words such as "tx", "txx", "txxx", etc., and "ux", "uxx", etc., but
\[t-v\]x+denotes words such as "[t- v]x", "[t-v]xx", "[t-v]xxx" etc.
- If you need to group certain regular expression elements, use parentheses. For example,
(a|b)+|cdenotes "c" and any combinations such as "abbbaaabbb", "ababab", etc. (a word of any non-zero length in which there can be any number of a's and b's in any order), while
a|b+|cdenotes "a", "c", and "b", "bb", "bbb", etc.