Using Patterns in SenseTalk
The SenseTalk pattern language lets you search for patterns within a source text with SenseTalk's natural language syntax. For a general description of the pattern language, see SenseTalk Pattern Language Basics. For complete information about how to define patterns, see Elements of the SenseTalk Pattern Language.
Below you will find information for how to use patterns in SenseTalk scripts.
Operators, Commands, and Functions for Pattern Matching
Patterns are supported by all SenseTalk operators that look for the presence or location of some specified text within a larger string, as well as commands and functions that work with text:
set ssn to <3 digits then dash then 2 digits then dash then 4 digits>
put every offset of ssn in textblock into ssnList // Finds the offset (first character position) of all occurrences of ssn pattern match in textblock and places those positions into the ssnList list
You can use any of the following operators, command, and functions with the pattern language:
contains
operatoris in
operatorbegins with
operatorends with
operatoroffset
,range
,every offset
,every range
functionsreplace
commanddelete
commandsplit
command,split by
functionnumber of occurrences of
function
In addition, there are operators and functions specific for use with patterns.
Matches
Operator
Use the matches
operator to test whether a variable or expression is an exact match for a pattern. One of the operands must be a pattern, and the other is treated as a string value. This operator returns true
if the pattern fully matches the entire string. If the pattern matches only part of the string or doesn't match at all, the result of the matches
operator is false
.
Example:
put 83 matches <digit,digit> --> True
put <"x", 3 chars, "y"> matches "xyzzy" --> True
Note that the matches
operator returns true
if the pattern can potentially match the full value, even if the usual match of the pattern might not return the full string (due to lazy quantifiers being used):
put <"$", digits> matches "$895" --> True
put the occurrence of <"$", digits> in "$895" --> "$8"
For information about lazy and greedy matches as well as quantifiers, see Elements of the SenseTalk Pattern Language.
Match
and Every Match
Functions
Use the match
and every match
functions to locate a pattern within text. The match
function finds the first occurrence of the specified pattern, and every match
finds every occurrence of the pattern in the source.
When the match
function finds a match for a pattern, it returns a match property list containing a number of properties, depending on the pattern. For a basic pattern, the property list includes text
, which is the matched text, and text_range
, which is the range where that text was found.
The every match
function returns a list of property lists, where each property list includes the same information as the match
function.
Example:
put the match of <3 digits> in "123456789" --> {text:"123", text_range:"1" to "3"}
put every match of <3 digits> in "123456789" --> [{text:"123", text_range:"1" to "3"},{text:"456", text_range:"4" to "6"},{text:"789", text_range:"7" to "9"}]
For detailed information about these functions, see Pattern Matching Functions.
Occurrence
and Every Occurrence
Functions
Use the occurrence
and every occurrence
functions to return the matched text for a defined pattern. The occurrence
function returns the first match found, and every occurrence
returns a list of every match found in the source text.
put the occurrence of <"(",chars,")"> in "sqrt(42)" --> "(42)"
put every occurrence of <3 digits> in "123456" --> [123,456]
For detailed information about these functions, see Pattern Matching Functions.
Using Variables and Expressions in a Pattern
When you define a pattern, you can use any of the elements described in Pattern Elements. However, sometimes a pattern must include elements that vary from one use to the next.
For example, consider a simple date pattern that matches dates in the form 15-Aug-2018. The following pattern finds such dates:
set datePattern to <1 or 2 digits, "-", 3 letters, "-", 4 digits>
But suppose you don't want to find every date, but only those in a particular month, which will be determined elsewhere in the script. In this case, you can use a variable in place of that element in the pattern. The pattern is constructed using the variable's value at runtime:
set month to "Oct"
set datePattern to <1 or 2 digits, "-", month, "-", 4 digits>