XPath: XML Path Language
XPath is a powerful query language for selecting nodes from an XML document. It's a W3C Recommendation and is widely used in conjunction with XSLT (Extensible Stylesheet Language Transformations) and XQuery.
What is XPath?
XPath provides a syntax for navigating through elements and attributes in an XML document. It treats an XML document as a tree structure, where each node (element, attribute, text, etc.) can be selected using path expressions. These expressions can filter and select specific nodes based on their names, attributes, content, and relationships to other nodes.
Key Concepts
1. The XML Tree
XPath operates on the XML document's hierarchical structure, which can be visualized as a tree. The main types of nodes are:
- Root Node (the document itself)
- Element Nodes
- Attribute Nodes
- Text Nodes
- Namespace Nodes
2. Path Expressions
These are the core of XPath. They consist of one or more location steps, separated by forward slashes (/).
- Absolute path: Starts from the root node (e.g.,
/bookstore/book). - Relative path: Starts from the current node (e.g.,
book/title).
3. Location Steps
Each location step has three components:
- Axis: Specifies the relationship between the current node and the nodes to be selected (e.g.,
child,attribute,parent,descendant). - Node Test: Specifies the type or name of nodes to select (e.g.,
element,attribute,*for any node, or a specific name likebook). - Predicate: An optional expression in square brackets (
[]) used to filter the selected nodes based on conditions (e.g.,[price > 50],[@lang='en']).
If the axis and node test are omitted, they default to child::* (all children elements).
Common Axes:
| Axis | Description |
|---|---|
child |
Selects children of the current node. |
attribute |
Selects attributes of the current node. |
parent |
Selects the parent of the current node. |
ancestor |
Selects all ancestors (parent, grandparent, etc.) of the current node. |
descendant |
Selects all descendants (children, grandchildren, etc.) of the current node. |
following-sibling |
Selects all siblings that appear after the current node. |
preceding-sibling |
Selects all siblings that appear before the current node. |
Common Node Tests:
*: Selects any element node.element_name: Selects element nodes with the specified name.@attribute_name: Selects attribute nodes with the specified name.text(): Selects text nodes.node(): Selects any node type.
Predicates:
Predicates are used to filter the nodes selected by a location step. They can use comparisons, functions, or other XPath expressions.
Examples
Consider the following XML document:
<?xml version="1.0"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Common XPath Expressions and their results:
| XPath Expression | Description | Result (based on the XML above) |
|---|---|---|
/bookstore/book |
Selects all book elements that are direct children of the bookstore root element. |
All three book elements. |
/bookstore/book/title |
Selects all title elements that are direct children of book elements, which are direct children of the root. |
The three title elements: "Everyday Italian", "Harry Potter", "Learning XML". |
//book |
Selects all book elements anywhere in the document (descendant-or-self axis). |
All three book elements. |
/bookstore/book[1] |
Selects the first book element that is a child of bookstore. |
The "Everyday Italian" book. |
/bookstore/book[@category='cooking'] |
Selects book elements that have an attribute named category with the value 'cooking'. |
The "Everyday Italian" book. |
/bookstore/book/title[@lang='en'] |
Selects title elements with a lang attribute equal to 'en'. |
All three title elements. |
/bookstore/book[price > 35] |
Selects book elements where the price child element's value is greater than 35. |
The "Learning XML" book. |
/bookstore/book/author/text() |
Selects the text content of all author elements that are children of book elements. |
"Giada De Laurentiis", "J K. Rowling", "Erik T. Ray". |
/bookstore/@name |
Selects the name attribute of the bookstore element. (In this XML, bookstore has no attributes). |
No result. |
//title/text()[contains(., 'Potter')] |
Selects text nodes that are children of title elements and whose content contains the substring "Potter". |
The text node for "Harry Potter". |
XPath Functions
XPath provides a rich set of built-in functions for string manipulation, numeric operations, node selection, and more. Some common ones include:
string(value): Converts a value to a string.concat(string1, string2, ...): Concatenates strings.contains(haystack, needle): Checks if a string contains another string.starts-with(string, prefix): Checks if a string starts with a specified prefix.string-length(string): Returns the length of a string.sum(node-set): Returns the sum of the string values of the nodes in a node-set.count(node-set): Returns the number of nodes in a node-set.position(): Returns the position of the current node in the node-set.last(): Returns the total number of nodes in the node-set.
XPath in Windows System
XPath is extensively used within the Windows operating system and its development tools for:
- Querying configuration files (e.g., XML-based settings).
- Processing XML data returned by Windows APIs.
- Transforming XML data using XSLT for display or further processing.
- Defining queries for data sources that expose XML interfaces.
Developers working with XML data in Windows environments will find XPath an indispensable tool for efficient data manipulation and retrieval.