Chunk Types
Chunk expressions let you work with all of these chunk types:
Type | Definition |
---|---|
characters | individual characters within text |
words | words separated by any amount of white space (spaces, tabs, returns) within text |
lines | paragraphs separated by any of several standard line endings (CR, LF, CRLF, etc.) |
text items | portions of text separated by commas |
list items | the individual items in a list |
bytes | the bytes within binary data |
occurrences | the text matches of a defined pattern |
matches | the text matches and text range of a defined pattern and its capture groups |
In addition, you can specify custom delimiters to be used in identifying text items, lines, and words, giving even greater functionality. These three text chunk types each have distinctive types of delimiters: text items are delimited by a single text string, lines are delimited by any of a list of text strings, and words are delimited by any number and combination of characters from a set of characters.
Characters
The simplest type of chunk is the character chunk. A character is simply one character of text, including both visible and invisible characters (invisible characters include control characters such as tab, carriage return, and linefeed characters). The word character
may be abbreviated as char
.
put "The quick brown fox" into animal
put character 1 of animal--> T
put the last char of animal --> x
put chars 3 to 7 of animal --> e qui
Words
A single word is defined as a sequence of characters not containing any whitespace characters, or a sequence of characters contained in quotation marks. A range of words includes all characters from the first word specified through the last word specified, including all intervening words and whitespace. Whitespace characters are spaces, tabs, and returns (newlines).
put "Sometimes you feel like a nut; sometimes you don’t." into slogan
put the second word of slogan --> you
put word 6 of slogan --> nut;
put words 1 to 3 of slogan --> Sometimes you feel
Note that quoted phrases are ordinarily treated as a single word, including the quotation marks:
put <<Mary said "Good day" to John.>> into sentence
put the third word of sentence --> "Good day"
Related Local and Global Properties
SenseTalk includes local and global properties you can use to govern aspects of working with words in chunks. The set of characters that are used to identify words can be changed to something other than Space, Tab, and Return by setting the wordDelimiter
local property or the defaultWordDelimiter
global property. The quote characters used to identify a quoted word (or whether word quoting should be disabled completely) can be specified with the wordQuotes
local property or the defaultWordQuotes
global property.
These local properties are defined on Local and Global Properties for Chunk Expressions:
the wordDelimiter, the defaultWordDelimiter
the wordQuotes, the defaultWordQuotes
Lines
A line chunk expression allows you to specify one or more lines or paragraphs of text within the subject text, where lines are initially defined as the characters between any of the standard line ending characters.
put "line 1" & return & "line 2" & return & "line 3" into text
put the second line of text --> line 2
put line 6 of text --> ""
put lines 2 to 3 of text
--> line 2
--> line 3
Related Local and Global Properties
SenseTalk includes two properties you can use to govern aspects of working with lines in chunks. The set of line endings (delimiter strings) that define what a line is can be changed to something other than the default by setting the lineDelimiter
local property. Setting the lineDelimiter
to empty causes it to return to the default list.
the defaultLineDelimiter
global property defines the default set of line delimiters. This property is initially set to: CRLF, Return, CarriageReturn, LineSeparator, ParagraphSeparator.
These properties are defined on Local and Global Properties for Chunk Expressions:
the lineDelimiter, the defaultLineDelimiter
Text Items
An item within text is usually defined as the portion of text between commas:
put "A man, a plan, a canal. Panama!" into palindrome
put item 2 of palindrome --> " a plan"
The separation (delimiter) character can be specified as something other than a comma by setting the itemDelimiter
property. the itemDelimiter
's default value is determined by the defaultItemDelimiter
global property. These two properties are defined on Local and Global Properties for Chunk Expressions:
the itemDelimiter, the defaultItemDelimiter
List Items
The word items
can also refer to the elements in a list.
put ["red", "green", "blue"] into colors
put item 2 of colors --> green
SenseTalk decides whether item
refers to text items or list items depending on whether the value is a list or not. When referring to items within a value which is a list, SenseTalk will automatically assume the reference is to list items, not text items. However, if the itemDelimiter
is set to “” (empty), items
will refer to list items rather than text items. You may explicitly refer to list items
or text items
instead of the more generic items
if you need to control the way items are treated. This is especially important if you are trying to create a list by putting values into individual items, like this:
put 1 into myText -- 1
put 2 into item 2 of myText
put mytext --> "1,2"
The code above will generate a text string, with the middle character being the itemDelimiter
(unless the itemDelimiter
has been set to empty
). To generate a list instead of text, specify list item
:
put 1 into myList -- 1
put 2 into list item 2 of myList
put myList --> [1,2]
See Lists and Property Lists for more information on working with lists.