OSDN Git Service

Regular updates
[twpd/master.git] / xpath.md
1 ---
2 title: Xpath
3 category: HTML
4 layout: 2017/sheet
5 tags: [Featured]
6 weight: -5
7 description: |
8   $x('//div//p//*') == $('div p *'), $x('//[@id="item"]') == $('#item'), and many other Xpath examples.
9 ---
10
11 ## Testing
12
13 ### Xpath test bed
14 {: .-intro}
15
16 Test queries in the Xpath test bed:
17
18 - [Xpath test bed](http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm) _(whitebeam.org)_
19
20 ### Browser console
21
22 ```js
23 $x("//div")
24 ```
25
26 Works in Firefox and Chrome.
27
28 ## Selectors
29
30 ### Descendant selectors
31
32 | CSS                          | Xpath                                                    | ?                       |
33 | ----                         | ----                                                     | --                      |
34 | `h1`                         | `//h1`                                                   | [?](#prefixes)          |
35 | `div p`                      | `//div//p`                                               | [?](#axes)              |
36 | `ul > li`                    | `//ul/li`                                                | [?](#axes)              |
37 | `ul > li > a`                | `//ul/li/a`                                              |                         |
38 | `div > *`                    | `//div/*`                                                |                         |
39 | ----                         | ----                                                     | --                      |
40 | `:root`                      | `/`                                                      | [?](#prefixes)          |
41 | `:root > body`               | `/body`                                                  |                         |
42 {: .xp}
43
44 ### Attribute selectors
45
46 | CSS                          | Xpath                                                    | ?                       |
47 | ----                         | ----                                                     | --                      |
48 | `#id`                        | `//*[@id="id"]`                                           | [?](#predicates)        |
49 | `.class`                     | `//*[@class="class"]` *...[kinda](#class-check)*          |                         |
50 | `input[type="submit"]`       | `//input[@type="submit"]`                                |                         |
51 | `a#abc[for="xyz"]`           | `//a[@id="abc"][@for="xyz"]`                             | [?](#chaining-order)    |
52 | `a[rel]`                     | `//a[@rel]`                                              |                         |
53 | ----                         | ----                                                     | --                      |
54 | `a[href^='/']`               | `//a[starts-with(@href, '/')]`                           | [?](#string-functions)  |
55 | `a[href$='pdf']`             | `//a[ends-with(@href, '.pdf')]`                          |                         |
56 | `a[href*='://']`             | `//a[contains(@href, '://')]`                            |                         |
57 | `a[rel~='help']`             | `//a[contains(@rel, 'help')]` *...[kinda](#class-check)* |                         |
58 {: .xp}
59
60 ### Order selectors
61
62 | CSS                          | Xpath                                                    | ?                       |
63 | ----                         | ----                                                     | --                      |
64 | `ul > li:first-of-type`      | `//ul/li[1]`                                             | [?](#indexing)          |
65 | `ul > li:nth-of-type(2)`     | `//ul/li[2]`                                             |                         |
66 | `ul > li:last-of-type`       | `//ul/li[last()]`                                        |                         |
67 | `li#id:first-of-type`        | `//li[1][@id="id"]`                                      | [?](#chaining-order)    |
68 | `a:first-child`              | `//*[1][name()="a"]`                                     |                         |
69 | `a:last-child`               | `//*[last()][name()="a"]`                                |                         |
70 {: .xp}
71
72 ### Siblings
73
74 | CSS                          | Xpath                                                    | ?                       |
75 | ----                         | ----                                                     | --                      |
76 | `h1 ~ ul`                    | `//h1/following-sibling::ul`                             | [?](#using-axes)        |
77 | `h1 + ul`                    | `//h1/following-sibling::ul[1]`                          |                         |
78 | `h1 ~ #id`                   | `//h1/following-sibling::[@id="id"]`                     |                         |
79 {: .xp}
80
81 ### jQuery
82
83 | CSS                          | Xpath                                                    | ?                       |
84 | ----                         | ----                                                     | --                      |
85 | `$('ul > li').parent()`      | `//ul/li/..`                                             | [?](#other-axes)        |
86 | `$('li').closest('section')` | `//li/ancestor-or-self::section`                         |                         |
87 | `$('a').attr('href')`        | `//a/@href`                                              | [?](#steps)             |
88 | `$('span').text()`           | `//span/text()`                                          |                         |
89 {: .xp}
90
91 ### Other things
92
93 | CSS                          | Xpath                                                    | ?                       |
94 | ----                         | ----                                                     | --                      |
95 | `h1:not([id])`               | `//h1[not(@id)]`                                         | [?](#boolean-functions) |
96 | Text match                   | `//button[text()="Submit"]`                              | [?](#operators)         |
97 | Text match (substring)       | `//button[contains(text(),"Go")]`                        |                         |
98 | Arithmetic                   | `//product[@price > 2.50]`                               |                         |
99 | Has children                 | `//ul[*]`                                                |                         |
100 | Has children (specific)      | `//ul[li]`                                               |                         |
101 | Or logic                     | `//a[@name or @href]`                                    | [?](#operators)         |
102 | Union (joins results)        | `//a | //div`                                            | [?](#unions)            |
103 {: .xp}
104
105 <style>
106 /* ensure tables align */
107 table.xp {table-layout: fixed;}
108 table.xp tr>:nth-child(1) {width: 35%;}
109 table.xp tr>:nth-child(2) {width: auto;}
110 table.xp tr>:nth-child(3) {width: 10%; text-align:right;}
111 </style>
112
113 ### Class check
114
115 ```bash
116 //div[contains(concat(' ',normalize-space(@class),' '),' foobar ')]
117 ```
118
119 Xpath doesn't have the "check if part of space-separated list" operator, so this is the workaround ([source](http://pivotallabs.com/xpath-css-class-matching/)).
120
121 Expressions
122 -----------
123
124 ### Steps and axes
125
126 | `//` | `ul` | `/`  | `a[@id='link']` |
127 | Axis | Step | Axis | Step            |
128 {: .-css-breakdown}
129
130 ### Prefixes
131
132 | Prefix | Example               | What     |
133 | ---    | ---                   | ---      |
134 | `//`   | `//hr[@class='edge']` | Anywhere |
135 | `./`   | `./a`                 | Relative |
136 | `/`    | `/html/body/div`      | Root     |
137 {: .-headers}
138
139 Begin your expression with any of these.
140
141 ### Axes
142
143 | Axis | Example             | What       |
144 | ---  | ---                 | ---        |
145 | `/`  | `//ul/li/a`         | Child      |
146 | `//` | `//[@id="list"]//a` | Descendant |
147 {: .-headers}
148
149 Separate your steps with `/`. Use two (`//`) if you don't want to select direct children.
150
151 ### Steps
152
153 ```bash
154 //div
155 //div[@name='box']
156 //[@id='link']
157 ```
158
159 A step may have an element name (`div`) and [predicates](#predicate) (`[...]`). Both are optional.
160 They can also be these other things:
161
162 ```bash
163 //a/text()     #=> "Go home"
164 //a/@href      #=> "index.html"
165 //a/*          #=> All a's child elements
166 ```
167
168 Predicates
169 ----------
170
171 ### Predicates
172
173 ```bash
174 //div[true()]
175 //div[@class="head"]
176 //div[@class="head"][@id="top"]
177 ```
178
179 Restricts a nodeset only if some condition is true. They can be chained.
180
181 ### Operators
182
183 ```bash
184 # Comparison
185 //a[@id = "xyz"]
186 //a[@id != "xyz"]
187 //a[@price > 25]
188 ```
189
190 ```bash
191 # Logic (and/or)
192 //div[@id="head" and position()=2]
193 //div[(x and y) or not(z)]
194 ```
195
196 Use comparison and logic operators to make conditionals.
197
198 ### Using nodes
199
200 ```bash
201 # Use them inside functions
202 //ul[count(li) > 2]
203 //ul[count(li[@class='hide']) > 0]
204 ```
205
206 ```bash
207 # This returns `<ul>` that has a `<li>` child
208 //ul[li]
209 ```
210
211 You can use nodes inside predicates.
212
213 ### Indexing
214
215 ```bash
216 //a[1]                  # first <a>
217 //a[last()]             # last <a>
218 //ol/li[2]              # second <li>
219 //ol/li[position()=2]   # same as above
220 //ol/li[position()>1]   # :not(:first-of-type)
221 ```
222
223 Use `[]` with a number, or `last()` or `position()`.
224
225 ### Chaining order
226
227 ```bash
228 a[1][@href='/']
229 a[@href='/'][1]
230 ```
231
232 Order is significant, these two are different.
233
234 ### Nesting predicates
235
236 ```
237 //section[.//h1[@id='hi']]
238 ```
239
240 This returns `<section>` if it has an `<h1>` descendant with `id='hi'`.
241
242 Functions
243 ---------
244
245 ### Node functions
246
247 ```bash
248 name()                     # //[starts-with(name(), 'h')]
249 text()                     # //button[text()="Submit"]
250                            # //button/text()
251 lang(str)
252 namespace-uri()
253 ```
254
255 ```bash
256 count()                    # //table[count(tr)=1]
257 position()                 # //ol/li[position()=2]
258 ```
259
260 ### Boolean functions
261
262 ```bash
263 not(expr)                  # button[not(starts-with(text(),"Submit"))]
264 ```
265
266 ### String functions
267
268 ```bash
269 contains()                 # font[contains(@class,"head")]
270 starts-with()              # font[starts-with(@class,"head")]
271 ends-with()                # font[ends-with(@class,"head")]
272 ```
273
274 ```bash
275 concat(x,y)
276 substring(str, start, len)
277 substring-before("01/02", "/")  #=> 01
278 substring-after("01/02", "/")   #=> 02
279 translate()
280 normalize-space()
281 string-length()
282 ```
283
284 ### Type conversion
285
286 ```bash
287 string()
288 number()
289 boolean()
290 ```
291
292 Axes
293 ----
294
295 ### Using axes
296
297 ```bash
298 //ul/li                       # ul > li
299 //ul/child::li                # ul > li (same)
300 //ul/following-sibling::li    # ul ~ li
301 //ul/descendant-or-self::li   # ul li
302 //ul/ancestor-or-self::li     # $('ul').closest('li')
303 ```
304
305 Steps of an expression are separated by `/`, usually used to pick child nodes. That's not always true: you can specify a different "axis" with `::`.
306
307 | `//` | `ul` | `/child::` | `li` |
308 | Axis | Step | Axis       | Step |
309 {: .-css-breakdown}
310
311 ### Child axis
312
313 ```bash
314 # both the same
315 //ul/li/a
316 //child::ul/child::li/child::a
317 ```
318
319 `child::` is the default axis. This makes `//a/b/c` work.
320
321 ```bash
322 # both the same
323 # this works because `child::li` is truthy, so the predicate succeeds
324 //ul[li]
325 //ul[child::li]
326 ```
327
328 ```bash
329 # both the same
330 //ul[count(li) > 2]
331 //ul[count(child::li) > 2]
332 ```
333
334 ### Descendant-or-self axis
335
336 ```bash
337 # both the same
338 //div//h4
339 //div/descendant-or-self::h4
340 ```
341
342 `//` is short for the `descendant-or-self::` axis.
343
344 ```bash
345 # both the same
346 //ul//[last()]
347 //ul/descendant-or-self::[last()]
348 ```
349
350 ### Other axes
351
352 | Axis                 | Abbrev | Notes                                            |
353 | ---                  | ---    | ---                                              |
354 | `ancestor`           |        |                                                  |
355 | `ancestor-or-self`   |        |                                                  |
356 | ---                  | ---    | ---                                              |
357 | `attribute`          | `@`    | `@href` is short for `attribute::href`           |
358 | `child`              |        | `div` is short for `child::div`                  |
359 | `descendant`         |        |                                                  |
360 | `descendant-or-self` | `//`   | `//` is short for `/descendant-or-self::node()/` |
361 | `namespace`          |        |                                                  |
362 | ---                  | ---    | ---                                              |
363 | `self`               | `.`    | `.` is short for `self::node()`                  |
364 | `parent`             | `..`   | `..` is short for `parent::node()`               |
365 | ---                  | ---    | ---                                              |
366 | `following`          |        |                                                  |
367 | `following-sibling`  |        |                                                  |
368 | `preceding`          |        |                                                  |
369 | `preceding-sibling`  |        |                                                  |
370 {: .-headers}
371
372 There are other axes you can use.
373
374 ### Unions
375
376 ```bash
377 //a | //span
378 ```
379
380 Use `|` to join two expressions.
381
382 More examples
383 -------------
384
385 ### Examples
386
387 ```bash
388 //*                 # all elements
389 count(//*)          # count all elements
390 (//h1)[1]/text()    # text of the first h1 heading
391 //li[span]          # find a <li> with an <span> inside it
392                     # ...expands to //li[child::span]
393 //ul/li/..          # use .. to select a parent
394 ```
395
396 ### Find a parent
397
398 ```bash
399 //section[h1[@id='section-name']]
400 ```
401 Finds a `<section>` that directly contains `h1#section-name`
402
403 ```bash
404 //section[//h1[@id='section-name']]
405 ```
406
407 Finds a `<section>` that contains `h1#section-name`.
408 (Same as above, but uses descendant-or-self instead of child)
409
410 ### Closest
411
412 ```bash
413 ./ancestor-or-self::[@class="box"]
414 ```
415
416 Works like jQuery's `$().closest('.box')`.
417
418 ### Attributes
419
420 ```bash
421 //item[@price > 2*@discount]
422 ```
423
424 Finds `<item>` and check its attributes
425
426 References
427 ----------
428 {: .-one-column}
429
430 * [Xpath test bed](http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm) _(whitebeam.org)_