Searching for or reading text on your system under test (SUT) using optical character recognition (OCR) requires use of a text property list. These property lists have a wide variety of properties you can include, depending on the text you are working with and the purpose of your search.
See Working with OCR to learn how to work with the most commonly used text properties in more detail.
This page covers the available OCR properties as they can be addressed in-line with SenseTalk scripting. You can also adjust your OCR properties from several locations in Eggplant Functional:
A text property list is a description of text on the SUT. Every text property list must contain the Text property, which defines the actual text string you are looking for, and any number of the additional properties described below. Any property that is not included in the property list defaults to your Text Preferences settings.
The following text properties can be used with any text property list:
- Text: Text string. The text string that you want to find on the SUT. (Required.)
- TextStyle: Text-style name. A group of predefined text properties. (For more information, see .)
Use the table below as a reference when working with OCR. It is a complete list of all supported SenseTalk properties for use with OCR. Some properties are available for reading and searching, while some are specific to one or the other. The type of value passed to the property is also included. Click the name of the property for its full description (a full description list is included on this page below the table). For more in-depth information on how to use the most common OCR properties, see Succeeding with OCR.
|ContrastColor||Yes||Yes||A color (see Color Values in SenseTalk)|
|ContrastTolerance||Yes||Yes||0-100 / Default 45|
|ExtraWords||Yes||Yes||A string or list of words|
|Language||Yes||Yes||A language name as specified in OCR Language Support (case sensitive).|
|PreferredPattern||Yes||Yes||Regular expression string (see Using Patterns in SenseTalk)|
|PreferredWords||Yes||Yes||A string or list of words|
|ProhibitedWords||Yes||Yes||A string or list of words|
|SearchRectangle||Yes||No*||A pair of coordinates or captured images defining a rectangle|
|TextRotation||Yes||Yes||One of four predefined values|
|TrimColor||No||Yes||A color (see Color Values in SenseTalk)|
|ValidPattern||Yes||Yes||Regular expression string (see Using Patterns in SenseTalk)|
|ValidWords||Yes||Yes||A string or list of words|
* You do not need to set a SearchRectangle property with ReadText() because ReadText() already takes a rectangle by default. This property can also be set with standard image searches; for more on this, see Image References.
OCR Property Definitions
- AggressiveTextExtraction: Boolean. Enable this property if you want OCR to extract as much text from the image as possible.
Log ReadText(("TLImage","BRImage"), enableaggressivetextextraction:on)
- CaseSensitive: Boolean. Default: off. Whether or not Eggplant Functional considers case in text searches. Enable this property to force text searches to respect case and only find text that matches your text string’s capitalization exactly. This property is for searching for text, not reading text.
Put "COUPON13995a" into Coupon
MoveTo (Text: Coupon, CaseSensitive:Yes)
- Contrast: Boolean. Whether or not the SUT display is converted to a high contrast two-color image before it is sent to OCR for analysis. If contrast is on, a color referred to as the "contrast color" (which can be set using the ContrastColor property) is considered the primary color of the SUT display, and all other colors are treated as the secondary color. Text can be found in either color. The Contrast property is available for use with both searching for (finding) text and reading text.
log ReadText(contrast:on, contrastColor:"ffffff", validCharacters: "abcdABCD12345", searchRectangle:("TLImage","BRImage"))Note: If Contrast is on, but the ContrastColor is not defined, the top left pixel of the area being searched is treated as the contrast color.
- ContrastColor: A color. Default: The top left corner of the search rectangle is used as the contrast color if Contrast is on and no other color is defined. If Contrast is on, the contrast color is considered the primary color of the SUT display, and all other colors are treated as the secondary color.
For instructions on finding the background color, see Determining the Background Color. A number of color value formats are recognized by SenseTalk. For the full list of formats, see Color Values in SenseTalk.
log ReadText(contrast:on, contrastColor:"ffffff", validCharacters: "abcdABCD12345")
- ContrastTolerance: Integer. Default: 45. When Contrast is on, contrastTolerance sets the maximum per-channel color difference that is allowed for a pixel to be seen as the contrast color.
Click (Text:"Andrew Young", Contrast:On, ContrastTolerance: 65)
- DPI: Integer. Default: 72. The DPI property refers to the DPI (dots per inch) of the SUT display. If you are having problems finding text on the SUT, check the SUT's DPI setting, and adjust the DPI property accordingly. Typical DPI settings include: 72, 144, 300, and 2540.
Click (text: "Continue", DPI: 2540, Language: English)
- EnhanceLocalContrast: Boolean. Default: off. Enable this property if you want OCR to automatically increase the local contrast of the text image being sent to the OCR engine. This property may aid recognition when some or all of the text being read has relatively low contrast, such as blue text on a dark background. When Contrast is turned on, this property has no effect, so it is only useful when Contrast is turned off.
Log ReadText(("TLImage","BRImage"), enhanceLocalContrast: On)
- ExtraWords: A word or list of words. Set this property to a list of words to supplement the built-in dictionary for the current language. These words will be given preference the same way as other dictionary words.ExtraWords is mutually exclusive with PreferredWords, ValidPattern, PreferredPattern, and ValidWords.
Log Readtext(("TLImage","BRImage"), Language: English, ExtraWords: "Elizabeth, Andrew, Steven, Katherine, Jacob, Brenda")
- IgnoreNewlines: Boolean. When enabled, ignoreNewlines causes OCR text searches to ignore line breaks, so a search will match a string even if it's broken over several lines. This property is only available for text searches (not available with ReadText()).
Click (Text:"Constantine Papadopoulos", IgnoreNewlines:On) -- In the case of a long name like this, it's possible that it could wrap to a second line in the interface of an application under test, but the OCR could still read it with IngoreNewlines enabled.
- IgnoreSpaces: Boolean. The ignoreSpaces property causes OCR text searches to disregard spaces in your text string. For example, the string "My Computer" would match "MyComputer" or "M y C o m p u t e r". The ignoreSpaces property is on by default. This is because the OCR sometimes reads spaces that are not intended, especially in strings that are not discrete words, and in text with unusual letter-spacing.
A tab called "My Account" is part of the UI of the software you're testing, but can appear with an underscore ("My_Account") or without the space ("MyAccount") in different contexts or on different devices.
Click (Text:"My Account", ignoreSpaces:On, IgnoreUnderscores:On) -- is able to find the Account tab whether it has an underscore, space, or no space.
- IgnoreUnderscores: Boolean. The ignoreUnderscores property causes OCR text searches to treat underscores as spaces during searches. For example, the string "My_Computer" would match "My_Computer" or "My Computer". The ignoreUnderscores property is on by default, because the OCR sometimes fails to recognize underscores.
Click (Text:"Account Overview", IgnoreUnderscores:On) -- Will click "Account Overview" in a case where the OCR is mistaking an underlined link as text with an underscore in the space ("Account_Overview").
- InvertImage: Boolean. Default: Off. Enable this property for OCR to invert the colors of the text image (like a photo negative) before sending it to the OCR engine for processing.
- Language: Language name (case sensitive). The natural language of the text you are searching for. (For a list of supported languages, see OCR Language Support.) OCR uses this as a guide, giving preference to words specified in the dictionary it is using. More than one language can be specified. Eggplant Functional comes with numerous languages by default, and additional languages are available for purchase. If no language is specified OCR will still read text; it just won't have a dictionary to compare its findings to. You can also create a Custom OCR Dictionary.Note: The language names are case-sensitive as defined by the OCR dictionary.
Click (Text:"Bezárás", Language:"Hungarian", SearchRectangle:("TLImage","BRImage")) -- Clicks the "close" ("Bezárás") button in an application using Hungarian.
- LowResolutionMode: Boolean. Default: Off. A mode of processing used by the OCR engine to treat the image it receives from Eggplant Functional as low resolution (the image is not actually converted to a lower resolution). This might help OCR recognize smaller characters.
- MultiLine: Boolean. Default value: Off. This property only applies when reading text near a point, as opposed to reading text within a rectangle. When MultiLine is on, the ReadText() function returns the line of text associated with your point, and any lines of text above and below that point if they appear to belong to the same block of text. When MultiLine is off, the ReadText() function only returns the line of text associated with the point.
Log ReadText("ShortTextBlockHeaderImage", MultiLine:On, Contrast:On, ContrastColor:BkgdColor)
- PreferDictionaryWords: Boolean. While OCR always prefers words in any dictionary it is provided by the Language property, PreferDictionaryWords takes this a step further and requires OCR to return a dictionary word if possible. It will only return a non-dictionary word—using its best interpretation of each character—if no possible variants are found. This property modifies the OCR dictionary. For more information see Customize the OCR Dictionary. Available for both reading and searching for text.
Log ReadText(("TLImage","BRImage"), Language:English, PreferDictionaryWords:On) --"Cattywampus" or some other unlikely word is the text shown within the given searchRectangle, and PreferDictionaryWords forces the correct word to be returned.
- PreferredPattern: Regular expression string (as defined in Using Patterns in SenseTalk). When this property is enabled and given a regular expression string, OCR gives preference to text that matches the provided pattern. For information on regular expression characters that can be used with SenseTalk, see Using Patterns in SenseTalk. If you want the OCR to require a pattern match, use ValidPattern.PreferredPatternis mutually exclusive with PreferredWords, ValidPattern, ValidWords, and ExtraWords.
- PreferredWords: A word or list of words. Set this property to a list of words to supplement the built-in dictionary for the current language. PreferredWords can be used for either reading or searching for text. This property modifies the OCR dictionary. For more information, see Customize the OCR Dictionary.PreferredWordsis mutually exclusive with ValidWords, ValidPattern, PreferredPattern, and ExtraWords.
Log ReadText(("TLImage","BRImage"), Language: PortugueseBrazilian, PreferredWords:CustomerNameList) -- A list of customer names is passed in as the value for the PreferredWords property, adding all of those names to the OCR dictionary.Note: In the example above, "PortugueseBrazilian" is not quoted. This unpopulated variable will resolve to its name, so the value passed to the Language property is "PortugueseBrazilian", the same as if this text were in quotes.
- ProhibitedWords: A word or list of words. Provide words OCR can recognize that are not what you are looking for to help steer it in the right direction. ProhibitedWords can be used for both reading and searching for text. This property modifies the OCR dictionary. For more information see Customize the OCR Dictionary.
Click(Text:"Annita",ProhibitedWords:"Amita") -- Using the ProhibitedWords property to eliminate possible misspellings that the OCR could mistake as being correct.
- SearchRectangle: Rectangle defined by a coordinate pair (top left corner, bottom right corner). With (0,0) being the top-left corner of the screen, the SearchRectangle property takes a pair of coordinates that define a rectangular area of the SUT screen. Eggplant Functional only searches for the text within this defined rectangle. The SearchRectangle property is for use with searching for text, not reading text. Setting a SearchRectangle with the ReadText() function does not require a special property, as ReadText() takes a rectangle by default. The SearchRectangle property can also be set when searching for Images; for more on this see Image References.
Searching for a text string using the SearchRectangle property:
Click (Text:"CharlieBrown", SearchRectangle:("TLImage","BRImage"),contrast:On)
Reading text by passing a rectangle directly to the ReadText() function:
- TextDifference: Integer. Default:0. This property causes text searches to find text that differs from your search by a given number of characters.
Only available with OCR searches.
moveTo text:"armadillo", searchRectangle:(305,241,372,274),TextDifference:2 -- Would find text written as "armadolli" or any other variation that differs from "armadillo" by one or two characters.
- TextRotation: One of four predefined values: Clockwise, Counter-clockwise, Upside-down, or None. When this property is set, OCR identifies words at the degree of rotation specified by one of the predefined values: Clockwise rotates 90 degrees to the right; Counter-clockwise rotates 90 degrees to the left; Upside-down rotates 180 degrees; None does not rotate the text. Can be used for both reading and searching for text.
Click (Text:"Charlie Brown",TextRotation: "Upside-down")
- Trim: Boolean. Default value: Off. When Trim is on, the OCR engine reduces the size of the rectangle provided to the ReadText() function until a non-background pixel is encountered (usually the edge of the text that you want it to read). The background color is taken from the top left pixel of the rectangle, or from the TrimColor property.
- TrimBorder: Integer. Default value: 0. When Trim is on, TrimBorder, is the pixel-width of background that is not trimmed from the ReadText() function rectangle. TrimBorder can be set to a negative number, to trim non-background edges from the rectangle.
- TrimColor: A color. When Trim is on, TrimColor is the color that is considered the background of the ReadText() function rectangle. If you do not set the TrimColor property, the background color is taken from the top left pixel of the rectangle. SenseTalk recognizes a number of color value formats. For the full list of formats, see Color Values in SenseTalk.
- TrimTolerance: Integer. Default value: 0. When Trim is on, TrimTolerance is a measure of how much a pixel can differ from the RGB value of the TrimColor and still be considered background.
- TrimWhitespace: Boolean. Default: On. When TrimWhitespace is on, all whitespace characters are removed from the beginning and end of returned text. When TrimWhitespace is off, the ReadText() function can return text that starts or ends with whitespace characters. Only for use with reading text, not searching for pre-defined strings.
- ValidCharacters: Text string. The validCharacters property limits the characters that may be found by the OCR text engine. ValidCharacters can be limited to the characters in the string you are searching for by setting the string to "*". This can be useful if you are trying to "force" a text match from characters that are not being recognized. If OCR determines that characters are present in the defined area but they do not match characters provided in the validCharacters string, it will return "^".
Setting the validCharacters manually:
Log ReadText(searchRectangle:("TLImage","BRImage"), ValidCharacters:"$£€.,0123456789") -- reads a numeric value including currency symbols
Setting the ValidCharacters to the text being searched for, using an asterisk:
Click (Text:"CoDe13v9065", ValidCharacters:"*", SearchRectangle:("UpperLeftImage","LowerRightImage"))
- ValidPattern: Regular expression string (as defined in Using Patterns in SenseTalk). This property takes a regular expression value and returns only characters or words that match the pattern specified. For information on regular expression characters that can be used with SenseTalk, see Using Patterns in SenseTalk. If you want OCR to prefer a pattern but not require it, see PreferredPattern. ValidPattern is mutually exclusive to PreferredWords, ValidWords, PreferredPattern, and ExtraWords.
Log ReadText(("RT1","RT2"), validPattern:"[0-9][0-9]:[0-9][0-9]") -- Reads the time off of the SUT screen.
put formattedTime("[m]/[d]/[year]") into today -- Formats today's date according to the pattern provided to formattedTime()
Click (Text:Today, SearchRectangle:("TL_Date","BR_Date"), validPattern:"[0-9]/[0-9]/[0-9][0-9][0-9][0-9]") -- Clicks the date where found on the SUT screen, opening up a date and time panel. The date format read would be 1/4/2020 with the pattern passed to validPattern in this example.
- ValidWords: A single word or a string or list of words. Default: Empty (the OCR engine uses the specified Language. Limiting the words that OCR can consider a match allows you to steer the OCR engine toward a successful match, or force the engine to recognize your text string correctly. You can use the asterisk (*) as a wildcard so that the OCR engine looks only for the words in your original text string. This property limits the words that may be found by the OCR text engine; for more see Customize the OCR Engine Dictionary. The validWords property overrides the Language property. This override means that words that are not part of the validWords property are not returned. ValidWords is mutually exclusive toPreferredWords, ValidPattern, PreferredPattern, and ExtraWords.
Using ValidWords with a variable for the search text:
Put "Charlie Brown" into mytext
Click (text: mytext, searchRectangle:("TLImage","BRImage"), validwords:"*") -- Setting validCharacters to "*" to have it match the words being searched for with the Text property.
Using ValidWords to confirm a language setting in the application under test:
Log ReadText(("TLImage","BRImage"), validWords:"Japanese, English, Spanish, Portuguese, French")
The following table outlines the regular expression characters available for use with Optical Character Recognition (OCR):
|Item Name||Conventional Regular Expression Sign||Usage Example/ Explanation|
|Any Character||.||c.t - denotes words such as “cat” and “cot”|
|Character from a character range||||[b-d]ell - denotes words such as “bell”, “cell”, “dell”
[ty]ell - denotes words “tell” and “yell”
[A-Z] - denotes any uppercase alpha character
[a-z] - denotes any lowercase alpha character
[A-Я] - denotes any uppercase Cyrillic character
[а-я] - denotes any lowercase Cyrillic character
[0-9] - denotes any numeric character
[0-9a-zA-Z] - denotes any single character, including alpha and numeric characters
|Character out of a character range||[^]||[^y]ell - denotes words such as “dell”, “cell”, or “tell”, but not “yell”|
|Or|||||c(a|u)t - denotes words “cat” and “cut”|
|0 or more occurrences in a row||*||10* - denotes numbers 1, 10, 100, 1000, etc.|
|1 or more occurrences in a row||+||10+ - allows numbers 10, 100, 1000, etc. but not 1
[0-9a-zA-Z]+ - allows any word
|Space||[\s]||[0-9][\s][0-9] - denotes any space character|
- Some characters used in regular expressions are used for system purposes. As seen in the table above, these characters include square brackets, periods, etc.
- If you wish to enter an auxiliary character as a normal one, put a backslash (\) before it. Example: [t-v]x+ denotes words such as "tx", "txx", "txxx", etc., and "ux", "uxx", etc., but \[t-v\]x+ denotes words such as "[t- v]x", "[t-v]xx", "[t-v]xxx" etc.
- If you need to group certain regular expression elements, use parentheses. For example, (a|b)+|c denotes "c" and any combinations such as "abbbaaabbb", "ababab", etc. (a word of any non-zero length in which there can be any number of a's and b's in any order), while a|b+|c denotes "a", "c", and "b", "bb", "bbb", etc.