Skip to main content

Anchors with SenseTalk's Pattern Language

You can use SenseTalk's pattern language to match patterns within text, as explained in SenseTalk Pattern Language Basics. Sometimes you might want to create matches based on a position within the text rather than on specific text or pattern. Such a match lets you anchor a pattern match or subpattern to the location.

SenseTalk provides several ways of creating anchors within a pattern, each described below:

Defining Basic Anchors

Elements or sub-patterns within a pattern can be anchored at the beginning or end of the text, or the beginning or end of a line or word within the text. For example, the following pattern matches any whole word that begins with c and ends with t, such as cat or coast but doesn't match the same pattern within a word (such as in concatenation):

set wordPattern to <word starting with "c", some characters, "t" at the end of the word>

You can use a number of different syntax forms, to anchor a sub-pattern at the indicated location.

Word Anchor Syntax:

{a | the} word [starting | beginning] {with} pattern -- Beginning of word
{a | the} word {that} [starts | begins] {with} pattern -- Beginning of word
pattern at {the} [start | beginning] of {a | the} word -- Beginning of word
{a | the} word ending {with} pattern -- End of word
{a | the} word {that} ends {with} pattern -- End of word
pattern at {the} [end | ending] of {a | the} word -- End of word

Line Anchor Syntax:

{a | the} line [starting | beginning] {with} pattern -- Beginning of line
{a | the} line {that} [starts | begins] {with} pattern -- Beginning of line
pattern at {the} [start | beginning] of {a | the} line -- Beginning of line
{a | the} line ending {with} pattern -- End of line
{a | the} line {that} ends {with} pattern -- End of line
pattern at {the} [end | ending] of {a | the} line -- End of line

Whole Text Anchor Syntax:

text [starting | beginning] {with} pattern -- Beginning of the text
text {that} [starts | begins] {with} pattern -- Beginning of text
pattern at {the} [start | beginning] of {the} text -- Beginning of text
text ending {with} pattern -- End of the text
text {that} ends {with} pattern -- End of text
pattern at {the} [end | ending] of {the} text -- End of text

Standalone Anchors

You can specify an anchor independently as its own subpattern. Anchors specified like this match at the indicated location, but the match doesn't include any characters. That is, that portion of the match is an empty string.

Standalone Anchors Syntax:

{a | the} word break -- Location at beginning or end of a word
{a | the} [beginning | start | starting] of {a | the} word -- Location at beginning of a word
{a | the} word [beginning | start | starting] -- Location at beginning of a word
{a | an | the} end of {a | the} word -- Location at end of a word
{a | the} word end -- Location at end of a word
{a | the} [beginning | start | starting] of {a | the} line -- Beginning of line
{a | an | the} end of {a | the} line -- End of line
{a | the} line start -- Location at beginning of a line
{a | the} line end -- Location at end of a line
{a | the} [beginning | start | starting] of {a | the} text -- Beginning of text
{a | an | the} end of {a | the} text -- End of text
{a | the} text start -- Location at beginning of the text
{a | the]} text end -- Location at end of the text

These anchors each represent a pure location within the text that doesn't include any characters. The following example locates all of the transitions between word and nonword characters and inserts a "|" character at those locations:

set sentence to "To all who come to this happy place, welcome."
replace every <word break> in sentence with "|"

The resulting text string would look like this:

|To| |all| |who| |come| |to| |this| |happy| |place|, |welcome|.

Locations in Context

Patterns can be anchored to locations by using other pattern elements that are not treated as part of the matched range. This anchor type lets you identify a particular match location, then specify additional pattern elements relative to its location.

Such anchors are typically referred to as lookahead and lookbehind, because you can match additional pattern elements after or before the anchor, or collectively as lookaround. For example, let's start with a pattern to specify one or more characters enclosed in parentheses:

< "(" then some characters then ")" >

This pattern definition matches text occurrences, including the surrounding parentheses. To obtain only the enclosed contents without the enclosing parentheses themselves, use the preceded by and followed by specifiers:

<preceded by "(" then some characters then followed by ")">

In this pattern, some characters is the only part of the pattern that is included in the match. The preceded by "(" code is a lookbehind anchor that locates the opening parenthesis that must be present before the characters in order to find a match for the pattern, and the parenthesis itself is not included in the match.

Similarly, followed by ")" is a lookahead anchor that ensures the pattern only matches a group of characters that are followed by a closing parenthesis, without including that character in the match.

The same pattern can also be written more conversationally like this:

<some characters preceded by "(" followed by ")">

Lookaround Anchors Syntax:

preceded by locatingPattern -- Location that has locatingPattern immediately before it
not preceded by locatingPattern -- Location that does not have locatingPattern immediately before it
followed by locatingPattern -- Location that has locatingPattern immediately after it
not followed by locatingPattern -- Location that does not have locatingPattern immediately after it
pattern preceded by locatingPattern -- Pattern at a location that has locatingPattern before it
pattern not preceded by locatingPattern -- Pattern at a location that does not have locatingPattern before it
pattern followed by locatingPattern -- Pattern at a location that has locatingPattern after it
pattern not followed by locatingPattern -- Pattern at a location that does not have locatingPattern after it
Important

Important: The locatingPattern used with preceded by must be of a constrained length. It cannot include patterns that have an indeterminate, unconstrained length, such as preceded by some characters or preceded by one or more digits. A variable but constrained length is allowed, such as preceded by 3 to 5 digits. Patterns used with followed by do not have this limitation.

Remember, with lookaround anchors, the locatingPattern doesn't consume characters in the source text, and it is not part of the matched range.