XPath Primer

From Training Material
Jump to navigation Jump to search

Nobleprog.svg


Introduction

  • XPath is used to navigate through elements and attributes in an XML document
  • HTML is roughly a subset of XML
  • XPath can select elements or element sets in an HTML document
The expressions are similar to specifying a path in a file system
Generally the purpose of XPath is to allow the user to select a single element or a set of elements

Terminology

Nodes - nodes are in parent child relationships (a tree of nodes)
Elements - <a href="xxx">Click this Link</a> is an element node
Attributes - value="Bike" and href="xxx" are attribute nodes
Text
Namespace
Processing-Instruction
Comment
Document

Node Selection

  • Useful path expressions
nodename - selects all nodes with the name "nodename"
/        - selects from the root node
//       - selects nodes from the current node that match the selection
.        - selects the current node
..       - selects the parent of the current node
@        - selects attributes
  • XPath contains a set of Node Tests which when satisfied chooses the node
comment()	          - Selects nodes that are comments.
node()                    - Selects nodes of any type.
processing-instruction()  - Selects nodes that are processing instructions. You can specify 
                            which processing instruction to select by providing it's name in
                            the parentheses.  These are generally in xml documents
text()                    - Selects a text node.

Predicates

Used to find a specific node once a path expression is specified
Predicates are placed within braces []
Positional Predicates have numbers, last() or position() within the braces []
Examples of node selection with predicates
//input[@value='Bike']                           - get the input element with an attribute of Bike
//a[@href="www.google.com"]                      - the "a" element (the link) with an href attribute of www.google.com
//img[@src="images/carved bowl.jpg"]             - get the img element with a src attribute of images/carved bowl.jpg
//div[@style="color:red"]/text()                 - get the text within the div tag that has attribute style="color:red"
//div[@style="color:red" and @height="80"]/text()- get the text within the div tag for the two attributes specified
//img[1]                                         - get the first img tag on the page
//img[last()]                                    - get the last img tag on the page
//img[position()<3]                              - get the first two img tags on the page
//ul[@class="headline-list__list js-split-list js-add-p1"]/li[@class="headline-list__item"][2]/a

XPath Paths

  • Path elements are separated by a slash /
//body/div[@style='color:red'] - from the root find body with a div where style='color:red'
//body/input - find all input elements that are children of the body element

Wildcards

  • Wildcards allows several elements to be chosen at the same time
*       - matches and element node
@*      - matches any attribute node
node()  - matches any node of any kind
Examples of wildcard use
//*[@height="80"]                   - all nodes that have a height attribute set to 80
//*[@src="images/carved bowl.jpg"]  - any node that has a src attribute of images/carved bowl.jpg
//div[@*]                           - any div tag that has an attribute of any kind

Selecting Several Paths

  • To select several nodes with varying paths
Use the | operator between the path expressions
For example /book/title | //book/price

XPath Axes

  • An axes defines a set of nodes with respect to the present node
ancestor            - selects all ancestors of the current node
ancestor-or-self    - selects ancestors and the current node itself
attribute           - selects all attributes of the current node
child               - selects all child nodes of the current node
descendant          - selects all descendants of the current node children, grandchildren, etc.
descendant-or-self  - selects all descendants and the node itself
following           - selects everything in the document after the closing tag of the current node
following-sibling   - selects all following siblings of the current node
namespace           - selects all namespace nodes of the current node
parent              - selects the parent of the current node
preceding           - selects the nodes in the document before the current node, excluding ancestors, 
                      attributes, and namespaces
preceding-sibling   - selects all siblings before the current node
self                - selects the current node
  • Examples of Axes Use
//div[@style="color:red" and @height="80"]/attribute::style - select the style attribute of the div tag
//img/attribute::src  - select the src attribute of all the image tags
//img[last()]/following::a - find the "a" tag (the link) that follows the last image tag in the page

XPath Functions

  • XPath has a significant number of functions that are available.
See reference XPath Functions
  • Examples of Functions use
//div[contains(text(),'Some')]     - find a div tag where the text starts with 'Some'
//input[starts-with(@id, 'text-')] - find an input tag where the id starts with 'text-'

XPath Hands on Exercise

Hands on Exercise
In this exercise use the XPath tester at XPath Tester
Go to the XPath Tester and put the following HTML code into the tester
Note that his code has been corrected slightly so that it is valid XML
For example the meta tag on line 3 has had a slash added before the final >
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Text Image and Video Demo</title>
</head>
<body>
<div style="color:red"> this is it</div>
<div>This is the divs tag's contents</div>
<a href="seleniumTest.jsp">back to main page</a>
<br/><br/>
North America is a continent wholly within the Northern Hemisphere and almost wholly within 
the Western Hemisphere. It can also be considered a northern subcontinent of the Americas.[1] 
It is bordered to the north by the Arctic Ocean, to the east by the Atlantic Ocean, to the west 
and south by the Pacific Ocean, and to the southeast by South America and the Caribbean Sea.

North America covers an area of about 24,709,000 square kilometers (9,540,000 square miles), 
about 4.8% of the planet's surface or about 16.5% of its land area. As of 2013, its population 
was estimated at nearly 565 million people across 23 independent states, representing about 7.5% 
of the human population. Most of the continent's land area is dominated by Canada, the United States, 
Greenland, and Mexico, while smaller states exist in the Central American and Caribbean regions. 
North America is the third largest continent by area, following Asia and Africa,[2] and the fourth 
by population after Asia, Africa, and Europe.[3]

The first people to live in North America were Paleoindians who began to arrive during the last 
glacial period by crossing the Bering land bridge. They differentiated into a number of diverse 
cultures and communities across the continent. The largest and most advanced Pre-Columbian 
civilizations in North America were the Aztecs in what is now Mexico and the Maya in Central 
America. European colonists began to arrive starting in the 16th and 17th centuries, wiping out 
large numbers of the native populations and beginning an era of European dominance.
<br/><br/>
<iframe width="640" height="360" src="https://www.youtube.com/embed/PEtz0twft2U?feature=player_detailpage" 
frameborder="0" allowfullscreen="true"></iframe><br/><br/>
<iframe width="640" height="360" src="https://www.youtube.com/embed/PEtz0twft2U?feature=player_detailpage" 
frameborder="0" allowfullscreen="true"></iframe><br/><br/>
<iframe width="640" height="360" src="https://www.youtube.com/embed/OnzZye5tGdY?feature=player_detailpage" 
frameborder="0" allowfullscreen="true"></iframe>
<br/><br/>
 <img src="images/bowl.jpg" alt="Bowl with filigree" height="80" width="100"/>
 <img src="images/bluebowl.jpg" alt="Bowl with filigree" height="80" width="100"/> 
 <img src="images/bowl with glass.jpg" alt="Bowl with filigree" height="80" width="100"/>
 <img src="images/carved bowl.jpg" alt="Bowl with filigree" height="80" width="100"/> 
<br/><br/>
<a href="seleniumTest.jsp">back to main page</a>
</body>
</html>
Find the XPath expressions that will find the following elements
  • All iframes on the page
  • The image with a source file of images/bluebowl.jpg
  • Any div tags on the page
  • All image tags on the page
  • The text contents of the div tag with an attribute
References:
XPath Tutorial
Another XPath Tutorial
A third XPath Tutorial
XPath Tester
XPath Tester (Another one, previous one seems better)