XPath Primer
Jump to navigation
Jump to search
Introduction
- XPath is used to navigate through elements and attributes in an XML document
- HTML is roughly a subset of XML
- XPath can select elements or element sets in an HTML document
- The expressions are similar to specifying a path in a file system
- Generally the purpose of XPath is to allow the user to select a single element or a set of elements
Terminology
- See reference XPath Terminology
- Nodes - nodes are in parent child relationships (a tree of nodes)
- Elements - <a href="xxx">Click this Link</a> is an element node
- Attributes - value="Bike" and href="xxx" are attribute nodes
- Text
- Namespace
- Processing-Instruction
- Comment
- Document
Node Selection
- Useful path expressions
nodename - selects all nodes with the name "nodename" / - selects from the root node // - selects nodes from the current node that match the selection . - selects the current node .. - selects the parent of the current node @ - selects attributes
- XPath contains a set of Node Tests which when satisfied chooses the node
comment() - Selects nodes that are comments. node() - Selects nodes of any type. processing-instruction() - Selects nodes that are processing instructions. You can specify which processing instruction to select by providing it's name in the parentheses. These are generally in xml documents text() - Selects a text node.
Predicates
- Used to find a specific node once a path expression is specified
- Predicates are placed within braces []
- Positional Predicates have numbers, last() or position() within the braces []
- Examples of node selection with predicates
//input[@value='Bike'] - get the input element with an attribute of Bike //a[@href="www.google.com"] - the "a" element (the link) with an href attribute of www.google.com //img[@src="images/carved bowl.jpg"] - get the img element with a src attribute of images/carved bowl.jpg //div[@style="color:red"]/text() - get the text within the div tag that has attribute style="color:red" //div[@style="color:red" and @height="80"]/text()- get the text within the div tag for the two attributes specified //img[1] - get the first img tag on the page //img[last()] - get the last img tag on the page //img[position()<3] - get the first two img tags on the page //ul[@class="headline-list__list js-split-list js-add-p1"]/li[@class="headline-list__item"][2]/a
XPath Paths
- Path elements are separated by a slash /
- //body/div[@style='color:red'] - from the root find body with a div where style='color:red'
- //body/input - find all input elements that are children of the body element
Wildcards
- Wildcards allows several elements to be chosen at the same time
* - matches and element node @* - matches any attribute node node() - matches any node of any kind
- Examples of wildcard use
//*[@height="80"] - all nodes that have a height attribute set to 80 //*[@src="images/carved bowl.jpg"] - any node that has a src attribute of images/carved bowl.jpg //div[@*] - any div tag that has an attribute of any kind
Selecting Several Paths
- To select several nodes with varying paths
- Use the | operator between the path expressions
- For example /book/title | //book/price
- Use the | operator between the path expressions
XPath Axes
- An axes defines a set of nodes with respect to the present node
ancestor - selects all ancestors of the current node ancestor-or-self - selects ancestors and the current node itself attribute - selects all attributes of the current node child - selects all child nodes of the current node descendant - selects all descendants of the current node children, grandchildren, etc. descendant-or-self - selects all descendants and the node itself following - selects everything in the document after the closing tag of the current node following-sibling - selects all following siblings of the current node namespace - selects all namespace nodes of the current node parent - selects the parent of the current node preceding - selects the nodes in the document before the current node, excluding ancestors, attributes, and namespaces preceding-sibling - selects all siblings before the current node self - selects the current node
- Examples of Axes Use
//div[@style="color:red" and @height="80"]/attribute::style - select the style attribute of the div tag //img/attribute::src - select the src attribute of all the image tags //img[last()]/following::a - find the "a" tag (the link) that follows the last image tag in the page
XPath Functions
- XPath has a significant number of functions that are available.
- See reference XPath Functions
- Examples of Functions use
//div[contains(text(),'Some')] - find a div tag where the text starts with 'Some' //input[starts-with(@id, 'text-')] - find an input tag where the id starts with 'text-'
XPath Hands on Exercise
- Hands on Exercise
- In this exercise use the XPath tester at XPath Tester
- Go to the XPath Tester and put the following HTML code into the tester
- Note that his code has been corrected slightly so that it is valid XML
- For example the meta tag on line 3 has had a slash added before the final >
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> <title>Text Image and Video Demo</title> </head> <body> <div style="color:red"> this is it</div> <div>This is the divs tag's contents</div> <a href="seleniumTest.jsp">back to main page</a> <br/><br/> North America is a continent wholly within the Northern Hemisphere and almost wholly within the Western Hemisphere. It can also be considered a northern subcontinent of the Americas.[1] It is bordered to the north by the Arctic Ocean, to the east by the Atlantic Ocean, to the west and south by the Pacific Ocean, and to the southeast by South America and the Caribbean Sea. North America covers an area of about 24,709,000 square kilometers (9,540,000 square miles), about 4.8% of the planet's surface or about 16.5% of its land area. As of 2013, its population was estimated at nearly 565 million people across 23 independent states, representing about 7.5% of the human population. Most of the continent's land area is dominated by Canada, the United States, Greenland, and Mexico, while smaller states exist in the Central American and Caribbean regions. North America is the third largest continent by area, following Asia and Africa,[2] and the fourth by population after Asia, Africa, and Europe.[3] The first people to live in North America were Paleoindians who began to arrive during the last glacial period by crossing the Bering land bridge. They differentiated into a number of diverse cultures and communities across the continent. The largest and most advanced Pre-Columbian civilizations in North America were the Aztecs in what is now Mexico and the Maya in Central America. European colonists began to arrive starting in the 16th and 17th centuries, wiping out large numbers of the native populations and beginning an era of European dominance. <br/><br/> <iframe width="640" height="360" src="https://www.youtube.com/embed/PEtz0twft2U?feature=player_detailpage" frameborder="0" allowfullscreen="true"></iframe><br/><br/> <iframe width="640" height="360" src="https://www.youtube.com/embed/PEtz0twft2U?feature=player_detailpage" frameborder="0" allowfullscreen="true"></iframe><br/><br/> <iframe width="640" height="360" src="https://www.youtube.com/embed/OnzZye5tGdY?feature=player_detailpage" frameborder="0" allowfullscreen="true"></iframe> <br/><br/> <img src="images/bowl.jpg" alt="Bowl with filigree" height="80" width="100"/> <img src="images/bluebowl.jpg" alt="Bowl with filigree" height="80" width="100"/> <img src="images/bowl with glass.jpg" alt="Bowl with filigree" height="80" width="100"/> <img src="images/carved bowl.jpg" alt="Bowl with filigree" height="80" width="100"/> <br/><br/> <a href="seleniumTest.jsp">back to main page</a> </body> </html>
- Find the XPath expressions that will find the following elements
- All iframes on the page
- The image with a source file of images/bluebowl.jpg
- Any div tags on the page
- All image tags on the page
- The text contents of the div tag with an attribute
- Find the XPath expressions that will find the following elements