Elements of the SenseTalk Pattern Language
The SenseTalk pattern language lets you define patterns that you can use to match strings in text. As explained in SenseTalk Pattern Language Basics, the pattern matching capabilities are built on top of regular expressions (regex). The SenseTalk pattern language lets you define patterns in easy-to-read syntax.
You can create pattern definitions for simple patterns, such as the occurrence of any three digits. You can also define patterns for complex patterns that have optional or alternative portions, and that can have varying lengths. Every pattern, however simple or complex, is built from a number of basic pattern elements.
Pattern Language Syntax
Pattern definitions in the SenseTalk pattern language consist of the pattern description enclosed in angle brackets (< ... >).
Syntax:
{pattern} < patternLanguageExpression >
Note the word pattern is optional with the pattern language and typically will be omitted.
A pattern definition — represented in the syntax above by patternLanguageExpression — can be a single element, such as 7 digits to find any occurrence of seven digits in a row. However, most patterns will include a sequence of elements, or subpatterns. The sequence is specified by listing each subpattern one after another separated by commas, separated by the word then, or separated by listing each element on a new line (or some combination of these options).
Therefore, the following examples are all equivalent methods of representing a pattern definition for a Social Security identification number:
set ssn to <3 digits then "-", 2 digits then "-", 4 digits>
set ssn to <3 digits, "-", 2 digits, "-", 4 digits>
set ssn to <3 digits then "-"
2 digits, "-",
then 4 digits>
You can use the word or to specify alternative choices in a subpattern. For example,
<"cat" or "cow">
matches either cat or cow.
You can use parentheses to group elements when necessary. For example,
<"cat" or "cow" then 2 digits>
matches text like cat24 or cow17. But
<"cat" or ("cow" then 2 digits)>
matches either cat (with no digits needed) or something like cow97.
Pattern Elements
A pattern definition is made up of individual elements, or subpatterns. An element can be the characters included in a quoted string, such as "cat". It can also be a SenseTalk variable, an expression in parentheses, or one of several pattern elements described in the Pattern Definition Elements table below.
Note the following about how elements can be defined:
- Most pattern elements can be singular or plural.
- Singular elements match exactly one of the specified value.
- Plural elements match a sequence of one or more of the indicated value.
- Quantifiers can be used to explicitly control the number of characters or values you want an element to match.
Pattern Definition Elements
| Elements | Definition |
|---|---|
| "quoted string" | Matches an exact string of characters |
| variable | Use any valid element or combination of elements stored in a variable |
| expression | Use any SenseTalk expression, in parentheses, that yields a string |
charactercharacters | Matches any characters |
letterletters | Matches letters of any alphabet |
nonletternonletters | Matches characters other than letters of an alphabet |
lowercase letterlowercase letters | Matches lowercase letters in any language |
nonlowercase letternonlowercase letters | Matches letters other than lowercase letters in any language |
uppercase letter, capital letteruppercase letters, capital letters | Matches uppercase letters in any language |
nonuppercase letter, noncapital letternonuppercase letters, noncapital letters | Matches letters other than uppercase letters in any language |
digitdigits | Matches digits from 0 to 9 |
nondigitnondigits | Matches characters other than digits |
letterOrDigit, alphanumericlettersOrDigits, alphanumerics | Matches either letters or digits |
nonLetterOrDigit, nonAlphanumericnonLettersOrDigits, nonAlphanumerics | Matches characters other than letters or digits |
whitespace characterwhitespace characters | Matches white space characters (space, tab, line separator, etc.) |
nonwhitespace characternonwhitespace characters | Matches characters other than white space characters |
word characterword characters | Matches word characters (letters or digits) |
nonword characternonword characters | Matches characters other than word characters |
punctuation characterpunctuation characters | Matches punctuation characters |
nonpunctuation characternonpunctuation characters | Matches nonpunctuation characters |
| character [of | in | from] characterSet characters [of | in | from] characterSet | Matches characters that are in the characterSet (string, range, character class identifier, or list of these items) |
| character not [of | in | from] characterSet characters not [of | in | from] characterSet | Matches characters that are not in the characterSet |
You can abbreviate character as char, and characters as chars, in all cases.
Quantifiers and Elements
When you specify a singular pattern element, such as letter or digit, exactly one of that type of element creates a match for the pattern. Specifying a plural form, such as letters or digits, indicates that one or more of that element can create a match.
In addition, there are a number of quantifiers that you can use to specify how many times an element should appear to create a pattern match.
The following descriptions use the term character for simplicity, but any element term can be used.
Terms that Mean Exactly One Character
character
a character
one character
exactly one character
Example:
set myPattern to < "(" then a character then ")" >Matches sequences like:
(w)( )(7)(.)())Doesn't match:
()(42)(salamander)otherStuff
Terms that Mean an Exact Number of Characters
2 characters
exactly 2 characters
The number 2 is shown in these examples, but any positive integer can be used. A variable whose value is an integer can also be used, but the word exactly must be used in this case.
Example:
set myPattern to < "(" then 2 characters then ")" >Matches sequences like:
(42)(CO)(())Doesn't match:
()(w)( )(7)(.)())(salamander)otherStuff
Terms that Mean Zero or One Character
maybe character
maybe a character
maybe one character
zero or one character
zero or maybe one character
Example:
set myPattern to < "(" then maybe a character then ")" >Matches sequences like:
()(w)( )(7)(.)Doesn't match:
(42)(salamander)otherStuff
This pattern can also match ()) but as a lazy quantifier it will prefer to match just () unless it needs to match all three characters in some context.
Terms that Mean One or More Characters
characters
some characters
one or more characters
Example:
set myPattern to < "(" then characters then ")" >Matches sequences like:
(w)( )(7)(.)())(42)(salamander)Doesn't match:
()otherStuffIn the sentence "Amy (a woman) and Flossie (her cat) lie down to take a nap.", this pattern will match two strings:
(a woman)and(her cat)
Terms that Mean Zero or More Characters
maybe characters
maybe some characters
zero or more characters
Example:
set myPattern to < "(" then zero or more characters then ")" >Matches sequences like:
()(w)( )(7)(.)(42)(salamander)Doesn't match:
otherStuff
This pattern can also match ()) but as a lazy quantifier it will prefer to match just () unless it needs to match all three characters in some context.
In the sentence "Amy (a woman) and Flossie (her cat) lie down to take a nap.", this pattern will match two strings:
(a woman)and(her cat)