Introduction

XPath is used to navigate through elements and attributes in an XML document
HTML is roughly a subset of XML
XPath can select elements or element sets in an HTML document

The expressions are similar to specifying a path in a file system

Generally the purpose of XPath is to allow the user to select a single element or a set of elements

Terminology

See reference XPath Terminology

Nodes - nodes are in parent child relationships (a tree of nodes)

Elements - <a href="xxx">Click this Link</a> is an element node

Attributes - value="Bike" and href="xxx" are attribute nodes

Text

Namespace

Processing-Instruction

Comment

Document

Node Selection

Useful path expressions

nodename - selects all nodes with the name "nodename"
/        - selects from the root node
//       - selects nodes from the current node that match the selection
.        - selects the current node
..       - selects the parent of the current node
@        - selects attributes

XPath contains a set of Node Tests which when satisfied chooses the node

comment()	          - Selects nodes that are comments.
node()                    - Selects nodes of any type.
processing-instruction()  - Selects nodes that are processing instructions. You can specify 
                            which processing instruction to select by providing it's name in
                            the parentheses.  These are generally in xml documents
text()                    - Selects a text node.

Predicates

Used to find a specific node once a path expression is specified

Predicates are placed within braces []

Positional Predicates have numbers, last() or position() within the braces []

Examples of node selection with predicates

//input[@value='Bike']                           - get the input element with an attribute of Bike
//a[@href="www.google.com"]                      - the "a" element (the link) with an href attribute of www.google.com
//img[@src="images/carved bowl.jpg"]             - get the img element with a src attribute of images/carved bowl.jpg
//div[@style="color:red"]/text()                 - get the text within the div tag that has attribute style="color:red"
//div[@style="color:red" and @height="80"]/text()- get the text within the div tag for the two attributes specified
//img[1]                                         - get the first img tag on the page
//img[last()]                                    - get the last img tag on the page
//img[position()<3]                              - get the first two img tags on the page
//ul[@class="headline-list__list js-split-list js-add-p1"]/li[@class="headline-list__item"][2]/a

XPath Paths

Path elements are separated by a slash /

//body/div[@style='color:red'] - from the root find body with a div where style='color:red'

//body/input - find all input elements that are children of the body element

Wildcards

Wildcards allows several elements to be chosen at the same time

*       - matches and element node
@*      - matches any attribute node
node()  - matches any node of any kind

Examples of wildcard use

//*[@height="80"]                   - all nodes that have a height attribute set to 80
//*[@src="images/carved bowl.jpg"]  - any node that has a src attribute of images/carved bowl.jpg
//div[@*]                           - any div tag that has an attribute of any kind

Selecting Several Paths

To select several nodes with varying paths

Use the | operator between the path expressions

For example /book/title | //book/price

XPath Axes

An axes defines a set of nodes with respect to the present node

ancestor            - selects all ancestors of the current node
ancestor-or-self    - selects ancestors and the current node itself
attribute           - selects all attributes of the current node
child               - selects all child nodes of the current node
descendant          - selects all descendants of the current node children, grandchildren, etc.
descendant-or-self  - selects all descendants and the node itself
following           - selects everything in the document after the closing tag of the current node
following-sibling   - selects all following siblings of the current node
namespace           - selects all namespace nodes of the current node
parent              - selects the parent of the current node
preceding           - selects the nodes in the document before the current node, excluding ancestors, 
                      attributes, and namespaces
preceding-sibling   - selects all siblings before the current node
self                - selects the current node

Examples of Axes Use

//div[@style="color:red" and @height="80"]/attribute::style - select the style attribute of the div tag
//img/attribute::src  - select the src attribute of all the image tags
//img[last()]/following::a - find the "a" tag (the link) that follows the last image tag in the page

XPath Functions

XPath has a significant number of functions that are available.

See reference XPath Functions

Examples of Functions use

//div[contains(text(),'Some')]     - find a div tag where the text starts with 'Some'
//input[starts-with(@id, 'text-')] - find an input tag where the id starts with 'text-'

XPath Hands on Exercise

Hands on Exercise

In this exercise use the XPath tester at XPath Tester

Go to the XPath Tester and put the following HTML code into the tester

Note that his code has been corrected slightly so that it is valid XML

For example the meta tag on line 3 has had a slash added before the final >

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Text Image and Video Demo</title>
</head>
<body>
<div style="color:red"> this is it</div>
<div>This is the divs tag's contents</div>
<a href="seleniumTest.jsp">back to main page</a>
<br/><br/>
North America is a continent wholly within the Northern Hemisphere and almost wholly within 
the Western Hemisphere. It can also be considered a northern subcontinent of the Americas.[1] 
It is bordered to the north by the Arctic Ocean, to the east by the Atlantic Ocean, to the west 
and south by the Pacific Ocean, and to the southeast by South America and the Caribbean Sea.

North America covers an area of about 24,709,000 square kilometers (9,540,000 square miles), 
about 4.8% of the planet's surface or about 16.5% of its land area. As of 2013, its population 
was estimated at nearly 565 million people across 23 independent states, representing about 7.5% 
of the human population. Most of the continent's land area is dominated by Canada, the United States, 
Greenland, and Mexico, while smaller states exist in the Central American and Caribbean regions. 
North America is the third largest continent by area, following Asia and Africa,[2] and the fourth 
by population after Asia, Africa, and Europe.[3]

The first people to live in North America were Paleoindians who began to arrive during the last 
glacial period by crossing the Bering land bridge. They differentiated into a number of diverse 
cultures and communities across the continent. The largest and most advanced Pre-Columbian 
civilizations in North America were the Aztecs in what is now Mexico and the Maya in Central 
America. European colonists began to arrive starting in the 16th and 17th centuries, wiping out 
large numbers of the native populations and beginning an era of European dominance.
<br/><br/>
<iframe width="640" height="360" src="https://www.youtube.com/embed/PEtz0twft2U?feature=player_detailpage" 
frameborder="0" allowfullscreen="true"></iframe><br/><br/>
<iframe width="640" height="360" src="https://www.youtube.com/embed/PEtz0twft2U?feature=player_detailpage" 
frameborder="0" allowfullscreen="true"></iframe><br/><br/>
<iframe width="640" height="360" src="https://www.youtube.com/embed/OnzZye5tGdY?feature=player_detailpage" 
frameborder="0" allowfullscreen="true"></iframe>
<br/><br/>
 <img src="images/bowl.jpg" alt="Bowl with filigree" height="80" width="100"/>
 <img src="images/bluebowl.jpg" alt="Bowl with filigree" height="80" width="100"/> 
 <img src="images/bowl with glass.jpg" alt="Bowl with filigree" height="80" width="100"/>
 <img src="images/carved bowl.jpg" alt="Bowl with filigree" height="80" width="100"/> 
<br/><br/>
<a href="seleniumTest.jsp">back to main page</a>
</body>
</html>

Find the XPath expressions that will find the following elements

All iframes on the page
The image with a source file of images/bluebowl.jpg
Any div tags on the page
All image tags on the page
The text contents of the div tag with an attribute

References:

XPath Tutorial

Another XPath Tutorial

A third XPath Tutorial

XPath Tester

XPath Tester (Another one, previous one seems better)

XPath Primer

Contents

Introduction

Terminology

Node Selection

Predicates

XPath Paths

Wildcards

Selecting Several Paths

XPath Axes

XPath Functions

XPath Hands on Exercise

Navigation menu

Personal tools

Namespaces

Variants

Views

Search

Opportunities

Navigation

Tools