Chunk Types
Chunk expressions let you work with all of these chunk types:
Type | Definition |
---|---|
characters | individual characters within text |
words | words separated by any amount of white space (spaces, tabs, returns) within text |
lines | paragraphs separated by any of several standard line endings (CR, LF, CRLF, etc.) |
text items | portions of text separated by commas |
list items | the individual items in a list |
bytes | the bytes within binary data |
occurrences | the text matches of a defined pattern |
matches | the text matches and text range of a defined pattern and its capture groups |
In addition, you can specify custom delimiters to be used in identifying text items, lines, and words, giving even greater functionality. These three text chunk types each have distinctive types of delimiters: text items are delimited by a single text string, lines are delimited by any of a list of text strings, and words are delimited by any number and combination of characters from a set of characters.
Characters
The simplest type of chunk is the character chunk. A character is simply one character of text, including both visible and invisible characters (invisible characters include control characters such as tab, carriage return, and linefeed characters). The word character
may be abbreviated as char
.
put "The quick brown fox" into animal
put character 1 of animal--> T
put the last char of animal --> x
put chars 3 to 7 of animal --> e qui
Words
A single word is defined as a sequence of characters not containing any whitespace characters, or a sequence of characters contained in quotation marks. A range of words includes all characters from the first word specified through the last word specified, including all intervening words and whitespace. Whitespace characters are spaces, tabs, and returns (newlines).
put "Sometimes you feel like a nut; sometimes you don’t." into slogan
put the second word of slogan --> you
put word 6 of slogan --> nut;
put words 1 to 3 of slogan --> Sometimes you feel
Note that quoted phrases are ordinarily treated as a single word, including the quotation marks:
put <<Mary said "Good day" to John.>> into sentence
put the third word of sentence --> "Good day"
Related Local and Global Properties
SenseTalk includes local and global properties you can use to govern aspects of working with words in chunks. The set of characters that are used to identify words can be changed to something other than Space, Tab, and Return by setting the wordDelimiter
local property or the defaultWordDelimiter
global property. The quote characters used to identify a quoted word (or whether word quoting should be disabled completely) can be specified with the wordQuotes
local property or the defaultWordQuotes
global property.
These local properties are defined on Local and Global Properties for Chunk Expressions:
the wordDelimiter, the defaultWordDelimiter
the wordQuotes, the defaultWordQuotes