Using XML and Tree Structures in SenseTalk Scripts
On this page:
- Trees and Nodes
- Trees and XML
- Tree = List + Property List
- Working with Trees
- Creating XML from a Tree
- Accessing Tree Content
- Accessing Tree Nodes Using Xpath Expressions
- Three Special Properties: _tag, _children, _attributes
- Creating an Empty Tree
- Setting XML Attributes of a Tree
- Adding Children to a Tree
- Converting a Tree to Text
- Creating a Tree from a Property List
- Creating a Tree from a List
- Converting a Tree to a Property List
- Tree Comparisons
- Working with Node Types
- Global Properties
- Tree Functions
Trees and Nodes
A tree is a hierarchy that consists of a root node that can have any number of child nodes. Each child node can itself be a tree that can have any number of child nodes and so forth to any depth. Each node (except the root node) has a parent node, which is the tree (or subtree) that contains that node as a child.
In addition to having a parent node and zero or more child nodes, each node can also have a number of properties or attributes, including a tag name.
The basic structure of a tree is like this:
Because every node in a tree can have its own child nodes, each node (together with its children and later descendants) is itself a tree.
Trees and XML
While a tree can be useful for storing various types of hierarchical data, the tree structure in SenseTalk is specifically designed for working with XML documents or XML-based data. eXtensible Markup Language (XML) is a rich, flexible, and complex language that is used as the underlying foundation for a huge variety of data formats in use today.
SenseTalk's tree structure simplifies working with XML documents and data structures, making it easy to access individual values, while providing full access to all parts of a document when needed.
To fully understand trees and their relationship to XML it will be helpful to have at least a basic understanding of the XML structure and some of its terminology. Consider the following example XML document:
<?xml version="1.0?>
<order id ="001">
<customer name="Janet Brown"/>
<product code="prod345" size="6">
<quantity>3</quantity>
<amount>23.45</amount>
</product>
</order>
In the above example, the first line identifies this as a version 1.0 XML document. The second and last lines wrap the rest of the content in order tags. This entire section constitutes either a document node (as shown in the above example) or an element node (if the XML version information is absent) that contains two other elements: customer and product. The customer element has a name attribute but no additional content. The product element has code and size attributes, and also contains two more elements: quantity and amount. Both the quantity and amount elements contain enclosed text, known as text nodes.
We might represent this information in a tree like this:
The information in the parentheses indicates the type of node, and the attributes, if any, that are present in that node. In tree form, we can see that the order node has two children (customer and product) and that the product node in turn also has two children (quantity and amount). The quantity and amount nodes each have one child: a text node holding the corresponding value.
Notice that XML data consists mainly of elements (the document node can be treated as a special type of element). Each element has a tag (order, customer, product, and other tags), might have attributes (id, name, code, and other attributes), and might have children. An element's children might be elements, or might be simple text values (3, 23.45). There are other node types as well, including processing instructions and comments, but they are less common and are not shown in this example.
The order example presented here is typical of many XML formats in use today. By combining elements, attributes, and text in a nested structure, XML allows for a wide variety of different formats, and variations within them.
Tree = List + Property List
SenseTalk's approach to working with trees leverages the capabilities inherent in the language for dealing with lists and property lists, by treating a tree as a hybrid of both container types. The children of a tree can be accessed just like the items in a list, and its attributes can be accessed like the property values in a property list.
Some details are different for trees (as discussed below), but on the whole, if you are familiar with working with lists and property lists, you already know most of what you need to work with trees as well. See Lists and Property Lists for details.
Because trees have these characteristics, they might be useful even in situations that have nothing to do with XML, as a hybrid container type that behaves as both a list and a property list.
Working With Trees
Creating a Tree from XML
To load the contents of a URL containing XML data into an internal tree structure within a script, use the as a tree operator:
put url "http://some.site/data.xml" as a tree into myTree
This statement accesses the indicated URL and reads its contents. The as a tree operator tells SenseTalk to treat that data as an XML document and convert it into a tree structure which is then stored in the variable myTree.
It might be more convenient to start with XML contained directly in a script. The as tree operator works equally well for this:
set XMLSource to {{
<order id="001">
<customer name="Janet Brown"/>
<product code="prod345" size="6">
<quantity>3</quantity>
<amount>23.45</amount>
</product>
</order>
}}
// Store the XML information in XMLSource into variable order in tree format
set order to XMLSource as tree
There are references to this order tree in other examples below.
Creating XML from a Tree
Producing text in XML format from the data in a tree is even easier than creating a tree from XML text. Whenever a tree structure is used as text, it automatically creates an XML representation of the tree's contents. To make an XML file from a tree, for example, use this syntax:
Accessing Tree Content
The children of a tree can be accessed just like the items of a list. The following examples use the information from the as tree structure example, above:
put item 1 of order
Running the above command results in the following: <customer name="Janet Brown"></customer>
The attributes of a tree can be accessed just like the properties of a property list:
put order.id
Running the above command results in the following: 001
Combining items and properties provides access to more deeply nested data:
put the code of item 2 of order
Running the above command results in the following: prod345
Accessing Tree Nodes Using XPath Expressions
XPath is a standard mechanism for accessing content in XML documents. It provides a way to describe the node or set of nodes that you are interested in, and extracting the desired information for you. SenseTalk supports this powerful mechanism through node expressions, which let you access content within a tree by tag name:
put node "*/customer" of order
Running the above command results in the following: <customer name="Janet Brown"></customer>
Node expressions can describe a path to a nested node of a tree:
put node "*/product/amount" of order
Running the above command results in the following: <amount>23.45</amount>
A special text property helps to extract just the content of a tree or node:
put the text of node "*/product/amount" of order
Running the above command results in the following: 23.45
Use all nodes or every node to return a list of every node of a tree that matches an XPath expression:
put all nodes "*/product/*" of order
Running the above command results in the following:
(<quantity>3</quantity>,<amount>23.45</amount>)
put every node "*/*/amount" of order
Running the above command results in the following: (<amount>23.45</amount>)
The nodePath function returns an XPath expression for a particular node within a tree:
put nodepath of item 2 of item 2 of order
Running the above command results in the following: (/order[1]/product[1]/amount[1])
A node expression (but not all nodes) can also be used as a container that can be stored into to alter the contents of the tree:
put 7 into node "*/product/quantity" of order
Running the above command results in 7 replacing 3 as the quantity value. To verify that the change worked, run the following command:
put node "*/*/quantity" of order
Look for the following output to verity that the 3 was replaced by 7: <amount>7</amount>
XPath expressions include many different options for accessing specific nodes, only a few of which are shown here. For more details about using XPath, see XPath Definition or the full specification at XML Path Language (XPath).
Deleting Tree Nodes with Node Expressions
Tree nodes can be deleted using a node expression (previously only item expressions worked with the delete command for deleting tree nodes):
delete node "product/description" of order
Three Special Properties: _tag, _children, _attributes
There are a number of property names (all beginning with an underscore character) that have special meaning in a tree. The three most important ones are:
The _tag property refers to the tag name of an element.
The _children property refers to the children of a tree. Its value is a normal list containing all of the child trees.
The _attributes property refers to the attributes of a tree. Its value is a normal property list whose keys and values are the names and values of all of the tree's attributes. Through the _attributes property it is possible to access any attributes of a tree, including those that have the same name as one of the special tree properties.
In addition to the _tag, _children, and _attributes properties, a Document node can also have an _xmlinfo property, described below in Converting a Tree to Text.
Creating an Empty Tree
To create a tree entirely within a script rather than starting from an existing XML document, start with an empty tree to which content can be added:
put an empty tree into order
The tree produced by this statement is ready to accept children or attributes. It does not have a tag name, so a recommended second step would be to set its _tag property:
set order's _tag to "order"
Setting XML Attributes of a Tree
A tree's properties correspond to the "attributes" of an XML element. They are containers, and can be set just like the properties of a property list are set:
set order's id to "001"
The only limitations on setting a tree's properties are as follows:
- Values are always converted to text when they are set.
-
Property names must conform to the rules for standard XML identifiers (which are the same as for identifiers in SenseTalk:
-
They must begin with a letter or underscore.
-
They must contain only letters, underscores, and digits.
-
Adding Children to a Tree
The children of a tree are accessed like items in a list. To add a new child, use the insert command:
insert << <customer name="Jane Doe"/> >> into order
To verify that the change worked, run the following command:
put every node "*/customer" of order
Look for the following result: (<customer name="Janet Brown"></customer>,<customer name="Jane Doe"></customer>)
Children must be trees, or values that can be converted into a tree. Values are converted automatically when they are added to a tree, using the same rules as the tree function, described later in this section. Only nodes that have a nodeType of Document, Element, or DTD can have children. Other types of nodes do not have children and do not behave like lists.
Converting a Tree to Text
When a tree is accessed as text (such as when it is displayed by a put command), SenseTalk converts it automatically to a text representation in XML format. By setting the treeFormat's prettyPrint to true or false you can control whether or not the XML will be formatted on multiple lines with indentation for easier reading by a person. By default that property is set to true. The standardFormat() function can also be used to format a tree as text.
If the tree has document-level information (as defined by the _xmlinfo property) it will be used in generating the text representation of the tree. The _xmlinfo property can only be set at the top level of a tree (not a sub-tree), so inserting a tree as a sub-tree of another will discard its document-level information. The _xmlinfo property is a property list that can include the following document-level properties:
- CharacterEncoding – if set, this should be the name of a valid XML encoding (see Character Sets (IANA) for a list of valid encoding names – these are not the same as SenseTalk's string encoding types).
- DocumentContentType – must be one of XML, XHTML, HTML, or Text. This controls some aspects of the text representation that will be generated for that tree.
- MIMEType – should be set to a valid MIME type (see Media Types (IANA)).
- URI – the Uniform Resource Identifier (usually a URL) associated with that document.
- Version – should be either 1.0 or 1.1 to indicate the XML version.
In addition, the _xmlinfo can also include two lists of tree nodes representing comments or processing instructions which precede or follow the root element of the document:
- Head – a list of comments and processing instructions that precede the root element
- Tail – a list of comments and processing instructions that follow the root element
Creating a Tree from a Property List
It might be convenient to represent information in a script in the form of a property list, then convert it to a tree in order to produce XML output. SenseTalk's tree function (or asTree() or as a tree operator) supports property lists in several formats to make this convenient.
In the full standard format, the property list can include these special properties (and values): _tag or _element (tag name of an element); _attributes (property list of attributes of an element node); _children (list of child nodes); _text (contents of a text node); _comment or -- (contents of a comment node); _processingInstruction or _pi (contents of a processing instruction node); ? followed by processing instruction name (body of a processing instruction node); _XMLinfo (property list of special XML document attributes). Here is a simple example using this approach:
put (_tag:book,_children:"The Rose") as tree
-- <book>The Rose</book>
For situations where XML attributes are not needed, a simplified format can be used:
put {book:"The Rose"} as tree
--
Some XML formats use attributes but no content, which can be done like this:
put tree(_tag:"pg",_attributes:{id:43}) -- <pg id="43"></pg>
A simplified format can also be used in this case:
put tree(_tag:"pg", id:43)-- <pg id="43"></pg>
The rules for converting a property list to a tree can be summed up in this way: If there is only a single property, and it is not one of the special properties, that property name is taken to be the name of an element, and its value represents that element's children. If a property list has a _tag or _element property, it will produce an element node. In this case, if there is no _attributes property then other properties that do not have special meaning are assumed to be attributes.
Creating a Tree from a List
It is also possible to convert a list to a tree using the tree function (or asTree() or as a tree operator). When converted in this way, a list becomes an unnamed tree (with an empty tag). This also applies to nested lists or lists within property lists that are being converted.
Converting a Tree to a Property List
A tree can also be converted to a property list, by using the as operator. For example:
put "<zip>80521</zip>" as tree as object -- {zip:(80521)}
SenseTalk will use a simplified form for the property list if it can. To produce a standard format in all cases, set the treeFormat's useStandardFormat property to true:
set the treeFormat's useStandardFormat to true
put "<zip>80521</zip>" as tree as object
-- (_children:({_text:"80521"}), _tag:"zip")
Tree Comparisons
When two values are compared for equality (using the is or = operator), they are ordinarily compared as text. Only when both values are trees (in tree format, not a property list or XML text representation of a tree) are they compared as trees. You can force comparison as trees by specifying as tree for any non-tree value.
When one tree is compared to another, the two trees will be regarded as equal if they have identical contents, including identical children and properties. However, if two trees are nearly identical such that the only difference between them is that one tree has a version or characterEncoding property with the default value and the other tree lacks such a property, then the two trees will be treated as equal.
Working with Node Types
Each node within a tree has a node type. The nodeType property of a node returns a node's type:
put order's nodeType -- Document
put the nodeType of item 1 of order -- Element
The types of nodes that can be present in a tree include Document, Element, Text, DTD, ProcessingInstruction, and Comment. A node's type cannot be changed. To test whether a node is a particular type, the is a operator also can be used:
put order is a Document -- true
Only Document, Element, and DTD nodes can have children. Attempting to add a child node to any other type of node will result in an error.
Global Properties
In addition to the functions described below, there are two SenseTalk global properties you can use to govern certain aspects of tree behavior:
These global properties are defined on Global Properties for XML and Trees.
Tree Functions
Tree, AsTree Functions
Behavior: The tree or asTree function (called by the as a tree operator) returns the value of its parameter converted to a tree.
Syntax:
{the} tree of factor
tree(expr)
{the} asTree of factor
asTree(expr)
Syntax definitions for language elements follow these formatting guidelines:
- boldface: Indicates words and characters that must be typed exactly
- italic: Indicates expressions or other variable elements
- {} (curly braces): Indicate optional elements.
- [] (square brackets) separated by | (vertical pipes): Indicate alternative options where one or the other can be used, but not both.
Example syntax:
In this example, "open file" is required and must be typed exactly. "fileName" is a variable element; it is the path to and name of the file being opened. The following expression is optional and indicates why the file is being opened. If this expression is added, "for" is required and must be typed exactly. One of the following must be included, but only one, and they also must be typed exactly: "reading", "writing", "readwrite", "appending", or "updating".
When the tree function is called with a parameter that is a property list (object) that has an asTree property, the value of that property is used. If the object has an asTreeExpression property, the value of that property is evaluated as an expression (equivalent to calling the treeFromXML() function) to obtain the tree value. If the object has neither of these properties, an asTree function message is sent exclusively to the object and its helpers to obtain the tree value.
If the parameter is an object, but does not supply a tree representation of itself in any of the above ways, it is taken to be a direct property list representation of a tree structure or a node. The property list can include these special properties (and values): _tag or _element (tag name of an element); _attributes (property list of attributes of an element node); _children (list of child nodes); _text (contents of a text node); _comment or -- (contents of a comment node); _processingInstruction or _pi (contents of a processing instruction node); ? followed by processing instruction name (body of a processing instruction node); _XMLinfo (property list of special XML document attributes). See Creating a Tree from a Property List for more information.
If the parameter is not an object and it is not already a tree, its string value is evaluated as an XML expression (equivalent to calling the treeFromXML() function) to obtain the tree value.
If the parameter includes a version property, the resulting tree object will be a Document type node, otherwise it will be an Element node.
Examples:
put file "configuration.xml" as a tree into config
put asTree("<a>Contents</a>") -- <a>Contents</a>
put { _tag:book, _children:"The Rose" } as tree -- <book>The Rose</book>
put tree(_tag:"page", num:8) -- <page num="8"></page>
See Also:Conversion of Values and the as operator in Expressions.
TreeFromXML, TreeFromHTML Functions
Behavior: The treeFromXML function evaluates a text value as XML and returns a tree. The treeFromHTML function evaluates a text value as HTML and returns a tree representation of that HTML content.
Syntax:
{the} treeFromXML of factor
treeFromXML(expr)
treeFromXML(expr)
{the} treeFromHTML of factor
treeFromHTML(expr)
Syntax definitions for language elements follow these formatting guidelines:
- boldface: Indicates words and characters that must be typed exactly
- italic: Indicates expressions or other variable elements
- {} (curly braces): Indicate optional elements.
- [] (square brackets) separated by | (vertical pipes): Indicate alternative options where one or the other can be used, but not both.
Example syntax:
In this example, "open file" is required and must be typed exactly. "fileName" is a variable element; it is the path to and name of the file being opened. The following expression is optional and indicates why the file is being opened. If this expression is added, "for" is required and must be typed exactly. One of the following must be included, but only one, and they also must be typed exactly: "reading", "writing", "readwrite", "appending", or "updating".
The treeFromXML function tries to evaluate its parameter as XML text. If the text is valid XML, it is parsed and the resulting tree returned. The tree returned will be a Document node if document-level information such as the XML version is present in the text, or an Element node otherwise. If the text is not valid XML, the returned tree will represent an XML text node rather than an element or document, and the result will be set to a warning message.
Similarly, the treeFromHTML function tries to evaluate its parameter as HTML text. If the text is valid HTML, including a valid fragment (rather than a full document) it is parsed and the resulting tree returned. If the text is not valid HTML, an exception will be thrown.
Examples:
put treeFromXML(xmlText) into aTree
put treeFromHTML(htmlText) into htmlTree
DocumentTreeFromXML, DocumentTreeFromHTML Functions
Behavior: The documentTreeFromXML and documentTreeFromHTML functions evaluate a text value as either XML or HTML respectively and return a tree representation of that content. The returned value will always be a Document node rather than an Element node (assuming there are no errors).
Syntax:
{the} documentTreeFromXML of factor
documentTreeFromXML(expr)
{the} documentTreeFromHTML of factor
documentTreeFromHTML(expr)
Syntax definitions for language elements follow these formatting guidelines:
- boldface: Indicates words and characters that must be typed exactly
- italic: Indicates expressions or other variable elements
- {} (curly braces): Indicate optional elements.
- [] (square brackets) separated by | (vertical pipes): Indicate alternative options where one or the other can be used, but not both.
Example syntax:
In this example, "open file" is required and must be typed exactly. "fileName" is a variable element; it is the path to and name of the file being opened. The following expression is optional and indicates why the file is being opened. If this expression is added, "for" is required and must be typed exactly. One of the following must be included, but only one, and they also must be typed exactly: "reading", "writing", "readwrite", "appending", or "updating".
The documentTreeFromXML function tries to evaluate its parameter as XML text. If the text is valid XML, it is parsed and the resulting tree returned. The tree returned will be a Document node regardless of whether document-level information such as the XML version is present in the text. If the text is not valid XML, the returned tree will contain the text as a text node, and the result will be set to a warning message.
If the standardNodeExpressions is turned off, use "as document tree" instead of "as tree" to parse XML into a tree and ensure standard behavior of node expressions for that tree.
Similarly, the documentTreeFromXML function tries to evaluate its parameter as HTML text. If the text is valid HTML, including a valid fragment (rather than a full document) it is parsed and the resulting Document tree returned. If the text is not valid HTML, an exception will be thrown.
Examples:
put documentTreeFromXML(xmlText) into docTree
put documentTreeFromHTML(htmlText) into htmlDocTree
STTreeVersion Function
Behavior: The STTreeVersion function returns the current version number of the STTreeNode xmodule.
Syntax:
the STTreeVersion
STTreeVersion()
Syntax definitions for language elements follow these formatting guidelines:
- boldface: Indicates words and characters that must be typed exactly
- italic: Indicates expressions or other variable elements
- {} (curly braces): Indicate optional elements.
- [] (square brackets) separated by | (vertical pipes): Indicate alternative options where one or the other can be used, but not both.
Example syntax:
In this example, "open file" is required and must be typed exactly. "fileName" is a variable element; it is the path to and name of the file being opened. The following expression is optional and indicates why the file is being opened. If this expression is added, "for" is required and must be typed exactly. One of the following must be included, but only one, and they also must be typed exactly: "reading", "writing", "readwrite", "appending", or "updating".
Examples:
put STTreeVersion()