Using XML and Tree Structures in SenseTalk Scripts
SenseTalk's tree structure provides the ability to easily read data in XML format, access and manipulate that data within a tree, and produce XML from this data. A tree is a hierarchical data structure that behaves as both a list and a property list (with some restrictions). As a list, a tree contains items, sometimes called nodes , which are also trees. As a property list, a tree has properties that correspond to attributes of a node in XML terminology.
Trees and Nodes
A tree is a hierarchy that consists of a root node that can have any number of child nodes. Each child node can itself be a tree that can have any number of child nodes and so forth to any depth. Each node (except the root node) has a parent node, which is the tree (or subtree) that contains that node as a child.
In addition to having a parent node and zero or more child nodes, each node can also have a number of properties or attributes, including a tag name.
The basic structure of a tree is like this:
Because every node in a tree can have its own child nodes, each node (together with its children and later descendants) is itself a tree.
Trees and XML
While a tree can be useful for storing various types of hierarchical data, the tree structure in SenseTalk is specifically designed for working with XML documents or XML-based data. eXtensible Markup Language (XML) is a rich, flexible, and complex language that is used as the underlying foundation for a huge variety of data formats in use today.
SenseTalk's tree structure simplifies working with XML documents and data structures, making it easy to access individual values, while providing full access to all parts of a document when needed.
To fully understand trees and their relationship to XML it will be helpful to have at least a basic understanding of the XML structure and some of its terminology. Consider the following example XML document:
<?xml version="1.0?>
<order id ="001">
<customer name="Janet Brown"/>
<product code="prod345" size="6">
<quantity>3</quantity>
<amount>23.45</amount>
</product>
</order>
In the above example, the first line identifies this as a version 1.0 XML document. The second and last lines wrap the rest of the content in order
tags. This entire section constitutes either a document node (as shown in the above example) or an element node (if the XML version information is absent) that contains two other elements: customer
and product
. The customer
element has a name
attribute but no additional content. The product
element has code
and size
attributes, and also contains two more elements: quantity
and amount
. Both the quantity
and amount
elements contain enclosed text, known as text nodes.
We might represent this information in a tree like this:
The information in the parentheses indicates the type of node, and the attributes, if any, that are present in that node. In tree form, we can see that the order node has two children (customer and product) and that the product node in turn also has two children (quantity and amount). The quantity and amount nodes each have one child: a text node holding the corresponding value.
Notice that XML data consists mainly of elements (the document node can be treated as a special type of element). Each element has a tag (order
, customer
, product
, and other tags), might have attributes (id
, name
, code
, and other attributes), and might have children. An element's children might be elements, or might be simple text values (3
, 23.45
). There are other node types as well, including processing instructions and comments, but they are less common and are not shown in this example.
The order example presented here is typical of many XML formats in use today. By combining elements, attributes, and text in a nested structure, XML allows for a wide variety of different formats, and variations within them.
Tree = List + Property List
SenseTalk's approach to working with trees leverages the capabilities inherent in the language for dealing with lists and property lists, by treating a tree as a hybrid of both container types. The children of a tree can be accessed just like the items in a list, and its attributes can be accessed like the property values in a property list.
Some details are different for trees (as discussed below), but on the whole, if you are familiar with working with lists and property lists, you already know most of what you need to work with trees as well. See Lists and Property Lists for details.
Because trees have these characteristics, they might be useful even in situations that have nothing to do with XML, as a hybrid container type that behaves as both a list and a property list.
Working With Trees
Creating a Tree from XML
To load the contents of a URL containing XML data into an internal tree structure within a script, use the as a tree
operator:
put url "http://some.site/data.xml" as a tree into myTree
This statement accesses the indicated URL and reads its contents. The as a tree
operator tells SenseTalk to treat that data as an XML document and convert it into a tree structure which is then stored in the variable myTree
.
It might be more convenient to start with XML contained directly in a script. The as tree
operator works equally well for this:
set XMLSource to {{
<order id="001">
<customer name="Janet Brown"/>
<product code="prod345" size="6">
<quantity>3</quantity>
<amount>23.45</amount>
</product>
</order>
}}
// Store the XML information in XMLSource into variable order in tree format
set order to XMLSource as tree
There are references to this order tree in other examples below.
Creating XML from a Tree
Producing text in XML format from the data in a tree is even easier than creating a tree from XML text. Whenever a tree structure is used as text, it automatically creates an XML representation of the tree's contents. To make an XML file from a tree, for example, use this syntax:
put myTree into file "/path/to/aFile.xml"
Accessing Tree Content
The children of a tree can be accessed just like the items of a list. The following examples use the information from the as tree
structure example, above:
put item 1 of order
Running the above command results in the following:
The attributes of a tree can be accessed just like the properties of a property list:
put order.id
Running the above command results in the following: 001
Combining items and properties provides access to more deeply nested data:
put the code of item 2 of order
Running the above command results in the following: prod345
Accessing Tree Nodes Using XPath Expressions
XPath is a standard mechanism for accessing content in XML documents. It provides a way to describe the node or set of nodes that you are interested in, and extracting the desired information for you. SenseTalk supports this powerful mechanism through node
expressions, which let you access content within a tree by tag name:
put node "*/customer" of order
Running the above command results in the following:
Node expressions can describe a path to a nested node of a tree:
put node "*/product/amount" of order
Running the above command results in the following:
A special text
property helps to extract just the content of a tree or node:
put the text of node "*/product/amount" of order
Running the above command results in the following: 23.45
Use all nodes
or every node
to return a list of every node of a tree that matches an XPath expression:
put all nodes "*/product/*" of order
Running the above command results in the following:
(
put every node "*/*/amount" of order
Running the above command results in the following: (
The nodePath
function returns an XPath expression for a particular node within a tree:
put nodepath of item 2 of item 2 of order
Running the above command results in the following: (/order[1]/product[1]/amount[1])
A node
expression (but not all nodes
) can also be used as a container that can be stored into to alter the contents of the tree:
put 7 into node "*/product/quantity" of order
Running the above command results in 7 replacing 3 as the quantity
value. To verify that the change worked, run the following command:
put node "*/*/quantity" of order
Look for the following output to verity that the 3 was replaced by 7:
XPath expressions include many different options for accessing specific nodes, only a few of which are shown here. For more details about using XPath, see XPath Definition or the full specification at XML Path Language (XPath).
Deleting Tree Nodes with Node Expressions
Tree nodes can be deleted using a node expression (previously only item expressions worked with the delete command for deleting tree nodes):
delete node "product/description" of order
GatherNamespaces
Command and Function
Behavior: The GatherNamespaces
command and function scans a tree structure for namespace definitions in all sub-nodes within the tree, and copies all of those namespace definitions to the top level of the tree. Node expressions that specify a node using a name that includes a namespace prefix may fail if that namespace is not declared in the top level (root node) of the tree. Using GatherNamespaces
to copy namespace definitions to the root of the tree makes it possible to locate nodes within the tree using namespace-qualified names that might otherwise fail.
Command Usage
When you call GatherNamespaces as a command, you must pass the tree by reference to allow the function to make changes to the tree."
Command Syntax:
GatherNamespaces treeReference
Example:
gatherNamespaces myTree by reference // pass the tree by reference to allow changes
put node “//web:table” of myTree into myTableNode // namespace-qualified name should now be successful
Function Usage
When called as a function, GatherNamespaces returns a modified copy of the tree.
Function Syntax:
gatherNamespaces( aTree )
{the} gatherNamespaces of aTree
aTree . gatherNamespaces
Example:
put node “//namespace:name” of myTree.gatherNamespaces // call gatherNamespaces so the node reference will succeed