Text and Data Manipulation
SenseTalk has strong text handling capabilities. Its chunk expressions, described in Chunk Expressions, provide a powerful and intuitive means of accessing and manipulating specific portions of a text string. In addition, there are a number of commands and functions for obtaining information about text, converting between text and other data formats, and manipulating text at a high level. The commands and functions for performing these actions are described in detail below.
Capitalized
Function
Behavior: The capitalized
function returns text with the first letter of each word capitalized.
Syntax:
{the} capitalized of stringFactor
capitalized( stringExpr )
Example:
put capitalized of "now and then" --> "Now And Then"
Related:
CharToNum
Function
Behavior: Returns the numeric code (in Unicode) representing the first character of its parameter.
Syntax:
{the} charToNum of textFactor
charToNum( textExpr )
Example:
put charToNum("a") --> 97
Related:
Delete
Command
Behavior: The delete
command deletes a chunk of text or one or more occurrences of a target text string within a container. In its simplest form, it will delete every occurrence of the target text, regardless of case. Other forms allow you to specify a chunk by its location, or to tell how many occurrences of a target string—or even to indicate a particular occurrence—to delete, and to specify exact case matching.
Delete Chunk
This form of the delete
command will delete any chunk of text (characters, words, lines, or text items) from within a value. The chunk can be specified with any chunk expression describing the part of container that should be deleted. See Chunk Expressions for a full description.
Syntax:
delete chunk [of | in] container
Example:
set sentence to "In a hole in the ground there lived a hobbit."
delete words 2 to 4 of sentence
put sentence --> "In the ground there lived a hobbit."
Example:
delete the first 2 characters of line 15 of output
Example:
delete the first line of file manual
Delete Text or Pattern
This form of the delete
command will delete a target text value or pattern (see Pattern Language) from within a string. This form of delete
includes a number of variations which let you specify how many occurrences of the target—or even to indicate a particular occurrence—to delete, and to specify exact case matching.
Exactly one of the options defining the targetTextOrPattern must be supplied, as well as the container where the deletions will occur. The case options are optional. The options used may be specified in any order, although only one option of each type can be given.
Syntax:
delete {Options}
Options:
[in | within | from] container
{all {occurrences of} | every {occurrence of} } targetTextOrPattern
{the} [first | last] howMany {occurrences of} targetTextOrPattern
{the} ordinalTerm {occurrence of} targetTextOrPattern
occurrence ordinalNumber of targetTextOrPattern
[with | considering] case
[without | ignoring] case
Example:
delete "#" from text -- this will delete every occurrence of "#"
Example:
delete the last "s" in word 3 of sentence
Example:
delete all <punctuation or whitespace> from phoneNumber
Example:
delete every occurrence of "ugly" in manual
Example:
delete the third "i" within phrase considering case
You must include the in container or within container option in such a delete command. The container can be any container, including a variable, a portion of a variable (using a chunk expression), or a text file. If container is a variable, its contents may be of any sort, including a list or property list. The delete
command will delete the indicated text, searching through all values nested to any depth within such containers.
You must include one of the targetTextOrPattern options in a delete command, to specify what will be deleted. Simply providing a targetTextOrPattern expression will cause the command to delete every occurrence of the value of that expression within container, so use this option cautiously. You may optionally precede targetTextOrPattern by all {occurrences of} or every {occurrence of} in this case if you like, which makes the impact clearer.
If the first or last options are used, howMany will specify the number of occurrences of targetTextOrPattern that should be deleted, starting at either the beginning or end of container respectively.
If ordinalTerm or ordinalNumber is given, only a single occurrence of targetTextOrPattern will be deleted. The ordinalTerm should be an ordinal number, such as first
, second
, third
, and so forth, or one of the terms middle
(or mid
), penultimate
, last
, or any
. The ordinalNumber should be an expression which evaluates to a number. If it is negative, the delete
command will count backward from the end of container to determine which occurrence to delete.
If considering case is specified, only occurrences of targetTextOrPattern within container that match exactly will be considered for deletion. The default is to delete any occurrence of targetTextOrPattern regardless of case.
The delete
command sets a result (as returned by the result
function) that indicates the number of occurrences that were deleted.
Delete Values
In addition to deleting a value by position (such as delete item 3 of myList
), you can delete specific values from a list. This is done using an each expression. For example, to delete all occurrences of the word "at" from some text you would say delete each word of myText which is equal to "at"
. The values to delete are selected using the where (or which or whose) clause of the each expression, which provides a great deal of flexibility. So in addition to deleting chunks that are equal to a given value, you can use almost any criteria to select the items to be deleted.
Syntax:
delete each chunkType [of | in] sourceValue {where conditionWithEach | which operator value | whose propertyCondition}
Example
delete each item of addressList whose zip code is 80202
Example
delete each line of testCases where the last item of each is "failed"
Example
delete each item of scores which is equal to zero
Example
delete each line of file "entries" which is empty
Example
delete each item of subscriberList whose expirationDate is earlier than yesterday
Related:
ExcludeItems
Function
Behavior: The excludeItems
function returns a list containing all of the values from list1 that are not also present in list2. By default, this function follows the current setting of the caseSensitive
property.
Syntax:
the items of list1 excluding those in list2
excludeItems( list1, list2 )
Example:
put ["carrots","bananas","pistachios","lettuce","wasabi","aspirin","tissues"] into GroceryList
put ["cat food","tissues","aspirin","soda","socks"] into TargetList
put the items of GroceryList excluding those in TargetList --> [carrots,bananas,pistachios,lettuce,wasabi]
Related:
Format
Function
Behavior: Returns a formatted text representation of any number of values, as defined by a template string. The template consists of text intermixed with special formatting codes to specify such things as numbers formatted with a defined number of decimal places, text values formatted with extra spaces to fill a defined minimum length, and more.
Syntax:
format( template, value1, value2, ... )
Example:
set interestRate to 5.457
put format("The interest rate is %3.2f %%", interestRate) --> The interest rate is 5.46 %
Example:
format(reportTemplate, day(date), month(date), description, amount)
Example:
format("%x", maskValue) -- converts maskValue to hexadecimal
The template string can include any format codes supported by the standard Unix printf command, as summarized below. In addition, certain “escape sequences” beginning with a backslash character are translated as follows: \e
— escape character; \a
— bell character; \b
— backspace character; \f
— formfeed character; \n
— newline character; \r
— carriage return character; \t
— tab character; \v
— vertical tab character; \’
— single quote character; \\
— backslash character; \num
— character whose ASCII value is the 1-, 2-, or 3-digit octal number num.
A format code begins with a percent sign (%
) followed by optional modifiers to indicate the length and number of decimal places for that value, and ends with a letter (d
, i
, u
, o
, x
, X
, f
, e
, E
, g
, G
, b
, c
, s
, a
, A
, or @
) that specifies the type of formatting to be done (see the table below). Two percent signs in a row (%%
) can be used to produce a percent sign in the output string.
Following the percent sign, and before the letter code, a format may include a number indicating the output length for this value. The length may be followed by a decimal point (.
) and another number indicating the “precision” — this is the number of decimal places to display for a number, or the maximum number of characters to display from a string. Either the length or precision may be replaced by an asterisk (*
) to indicate that that value should be read from an additional parameter.
Before the length, the format code may also include any of the following modifier codes as needed:
- A minus sign (-) indicates the value should be left-aligned within the given length
- A plus sign (+) indicates that signed number formats should display a plus sign for positive numbers
- A space ( ) indicates that signed number formats should include an extra space for positive numbers
- A zero (0) indicates that leading zeros (rather than spaces) should be used to fill the specified length
- A pound sign (#) affects specific numeric formats in different ways as described below
The following table lists the format codes that are recognized, their meaning, and examples:
d or i | signed (positive or negative) decimal integer (# has no effect):format("%4d", 27) —> " 27" format("%+-4d", 27) —> "+27 " format("%04i", 27) —> "0027" |
u | unsigned (must be positive) decimal integer (# has no effect):format("%u", 27) —> "27" |
o | unsigned octal integer (# increases precision to force a leading zero):format("%#o", 27) —> "033" |
x or X | unsigned hexadecimal integer (# prepends ‘0x’ or ‘0X’ before a non-zero value):format("%x", 27) —> "1b" format("%#X", 27) —> "0X1B" |
f | signed fixed-precision number (# forces decimal point to appear, even when there are no digits to the right of the decimal point):format("%f", 27) —> "27.000000" (default precision is 6 decimal places) format("%7.3f", 27.6349) —> " 27.635" format("%+*.*f", 7, 2, 27.63" format("%#-5.0f", 27) —> "27. " |
e or E | signed number in exponential notation with 'e' or 'E' before the exponent (# forces decimal point to appear, even when there are no digits to the right of the decimal point)format("%e", 27) —> "2.700000e+01" format("%9.2E", 0.04567) —> "4.57E-02" |
g or G | signed number in fixed (the same as 'f') or exponential (the same as 'e' or 'E') notation, whichever gives full precision in less space (# forces decimal point to appear, even when there are no digits to the right of the decimal point; trailing zeros are not removed)format("%g", 27) —> "27" format("%+g", 0.04567) —> "+0.04567" |
c | single characterformat("%-2c", "hello") —> "h" |
s | text stringformat("%6s", "hello") —> "hello" format("%-3.2s", ".2s", "hello") —> "he " |
b | text string with backslash escape sequences expandedformat("%b", "\tHello\\") —> " Hello\" |
a or A | signed number printed in scientific notation with a leading 0x (or 0X) and one hexadecimal digit before the decimal point using a lowercase p (or uppercase P) to introduce the exponent (not available on Windows) |
@ | any value, displayed in its usual text format |
Related:
GloballyUniqueString
Function
Behavior: The globallyUniqueString
function generates a unique string each time it is called. Call globallyUniqueString()
to obtain a unique string to be used as an identifier, key, or file name. The value is unique across systems, application instances, and individual calls to the function. The returned value is a universally unique identifier string (UUID) conforming to the RFC 4122 standard.
Syntax:
the globallyUniqueString
globallyUniqueString( { formatNumber } )
GloballyUniqueString is usually called with no parameters, or with a parameter value of 0. If called with a formatNumber whose value is ‘1’, it returns a unique string in a form that is backward-compatible with older versions of SenseTalk (V2.13 and earlier) but does not conform to the RFC 4122 standard.
Example:
put globallyUniqueString() --> 95400D7C-35E8-47D0-9B72-F7854978D7A0
put globallyUniqueString(0) --> C9CF7512-07E8-4E14-8520-C251A08C3988
Example:
set part's code to the globallyUniqueString // give part a unique identifier
Hash
Function
Behavior: The hash
function returns a hash value for the given data. SenseTalk supports many hash types (industry-standard algorithms), as well as multiple output formats. This function can also produce a hash-based message authentication code (HMAC) based on the chosen hash type and supplied secret key.
Syntax:
hash( dataValue )
hash( dataValue, options ) hash( optionsIncludingDataOrFile )
The options or optionsIncludingDataOrFile is a property list that can include these properties:
data
— the data value to be hashed. If you don't specify a dataValue, you must specify eitherdata
orfile
.file
— the path to a file to be hashed. If you don't specify a dataValue, you must specify eitherdata
orfile
.type
— the type of hash to compute. The default value is “sha256”.key
— the key string for HMAC. If you includekey
, SenseTalk uses the HMAC algorithm for the calculation. If you omitkey
, SenseTalk computes a simple hash value.output
— the output format for the hash value. The default value is “Hex”.
The type
can be "sha1", "sha256", "sha384", "sha512", "sha3-224", "sha3-256", "sha3-384", "sha3-512", "md2", "md5", "ripemd128", "ripemd160", "ripemd256", or "ripemd320”. If you don't specify a type
, SenseTalk uses the default value “sha256”.
The output
may be any of "Base64", "modBase64", "base64url", "Base32", "Base58", "UU", "QP" (for quoted-printable), "URL" (for url-encoding), "Hex", "Q", "B", "url_oauth", "url_rfc1738", "url_rfc2396", "url_rfc3986", or "fingerprint”. If you omit output
, SenseTalk uses the default value “Hex”. Note that if you specify “hex” (with a lowercase “h”), SenseTalk returns a lowercase hex string rather than uppercase.
Example:
set sourceData to "Eggplant"
put hash(sourceData) --> 84247C8AAF6DD96FB2878483CB0140C23E3C12ABA8CC987306D0A77986286526
put hash(sourceData, type:"MD5") --> B2585BC3E070132D2BF51DFFAE794F64
put hash(data:sourceData, type:"MD5", output:"fingerprint") --> b2:58:5b:c3:e0:70:13:2d:2b:f5:1d:ff:ae:79:4f:64
Example:
set myFile to the temp folder & "fileToHash.txt"
put “Eggplant” into file myFile
put hash(file:myFile, type:"MD5") --> B2585BC3E070132D2BF51DFFAE794F64
Example:
set sourceData to "The quick brown fox jumps over the lazy dog"
put hash(sourceData, key:"1234") --> 214E68BDD7C12D03971AAA929226147AFC786448D239CAEC7ECEB6A39ADC2BCF
put hash(sourceData, key:"eggplant") --> A9F52CABD3FBEC4CD73C2CBEF44D711100A433F8C755AA0D542772E71B98926D
put hash(data:sourceData, type:"sha1", key:"eggplant") --> 29ED9AC3D1EC500E0103C2E3AE9D9D900B7A7637
Insert
Command, Push
Command
Behavior: Inserts a value or list of values at the end of a list, or before or after a specific item of the list.
Syntax:
[insert | push] expr {nested | item by item} [before | into | after] container
Example:
insert 5 into myList
Example:
push "hedgehog" into Animalia
Example:
insert newName before item index of nameList
Example:
insert [3,5] nested after pointList -- creates a nested list
Example:
insert "myBuddy" before my helpers