OSDN Git Service

Regular updates
[twpd/master.git] / xpath.md
1 ---
2 title: Xpath
3 category: HTML
4 layout: 2017/sheet
5 tags: [Featured]
6 weight: -5
7 description: |
8   $x('//div//p//*') == $('div p *'), $x('//[@id="item"]') == $('#item'), and many other Xpath examples.
9 ---
10
11 ## Testing
12
13 ### Xpath test bed
14
15 Test queries in the Xpath test bed:<br>
16 [Xpath test bed](http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm) _(whitebeam.org)_
17
18 ### Browser console
19
20 ```js
21 $x("//div")
22 ```
23
24 Works in Firefox and Chrome.
25
26 ## Selectors
27
28 ### Descendant selectors
29
30 | CSS                          | Xpath                                                    | ?                       |
31 | ----                         | ----                                                     | --                      |
32 | `h1`                         | `//h1`                                                   | [?](#prefixes)          |
33 | `div p`                      | `//div//p`                                               | [?](#axes)              |
34 | `ul > li`                    | `//ul/li`                                                | [?](#axes)              |
35 | `ul > li > a`                | `//ul/li/a`                                              |                         |
36 | `div > *`                    | `//div/*`                                                |                         |
37 | ----                         | ----                                                     | --                      |
38 | `:root`                      | `/`                                                      | [?](#prefixes)          |
39 | `:root > body`               | `/body`                                                  |                         |
40 {: .xp}
41
42 ### Attribute selectors
43
44 | CSS                          | Xpath                                                    | ?                       |
45 | ----                         | ----                                                     | --                      |
46 | `#id`                        | `//*[@id="id"]`                                           | [?](#predicates)        |
47 | `.class`                     | `//*[@class="class"]` *...[kinda](#class-check)*          |                         |
48 | `input[type="submit"]`       | `//input[@type="submit"]`                                |                         |
49 | `a#abc[for="xyz"]`           | `//a[@id="abc"][@for="xyz"]`                             | [?](#chaining-order)    |
50 | `a[rel]`                     | `//a[@rel]`                                              |                         |
51 | ----                         | ----                                                     | --                      |
52 | `a[href^='/']`               | `//a[starts-with(@href, '/')]`                           | [?](#string-functions)  |
53 | `a[href$='pdf']`             | `//a[ends-with(@href, '.pdf')]`                          |                         |
54 | `a[href*='://']`             | `//a[contains(@href, '://')]`                            |                         |
55 | `a[rel~='help']`             | `//a[contains(@rel, 'help')]` *...[kinda](#class-check)* |                         |
56 {: .xp}
57
58 ### Order selectors
59
60 | CSS                          | Xpath                                                    | ?                       |
61 | ----                         | ----                                                     | --                      |
62 | `ul > li:first-child`        | `//ul/li[1]`                                             | [?](#indexing)          |
63 | `ul > li:nth-child(2)`       | `//ul/li[2]`                                             |                         |
64 | `ul > li:last-child`         | `//ul/li[last()]`                                        |                         |
65 | `li#id:first-child`          | `//li[@id="id"][1]`                                      |                         |
66 | `a:first-child`              | `//a[1]`                                                 |                         |
67 | `a:last-child`               | `//a[last()]`                                            |                         |
68 {: .xp}
69
70 ### Siblings
71
72 | CSS                          | Xpath                                                    | ?                       |
73 | ----                         | ----                                                     | --                      |
74 | `h1 ~ ul`                    | `//h1/following-sibling::ul`                             | [?](#using-axes)        |
75 | `h1 + ul`                    | `//h1/following-sibling::ul[1]`                          |                         |
76 | `h1 ~ #id`                   | `//h1/following-sibling::[@id="id"]`                     |                         |
77 {: .xp}
78
79 ### jQuery
80
81 | CSS                          | Xpath                                                    | ?                       |
82 | ----                         | ----                                                     | --                      |
83 | `$('ul > li').parent()`      | `//ul/li/..`                                             | [?](#other-axes)        |
84 | `$('li').closest('section')` | `//li/ancestor-or-self::section`                         |                         |
85 | `$('a').attr('href')`        | `//a/@href`                                              | [?](#steps)             |
86 | `$('span').text()`           | `//span/text()`                                          |                         |
87 {: .xp}
88
89 ### Other things
90
91 | CSS                          | Xpath                                                    | ?                       |
92 | ----                         | ----                                                     | --                      |
93 | `h1:not([id])`               | `//h1[not(@id)]`                                         | [?](#boolean-functions) |
94 | Text match                   | `//button[text()="Submit"]`                              | [?](#operators)         |
95 | Text match (substring)       | `//button[contains(text(),"Go")]`                        |                         |
96 | Arithmetic                   | `//product[@price > 2.50]`                               |                         |
97 | Has children                 | `//ul[*]`                                                |                         |
98 | Has children (specific)      | `//ul[li]`                                               |                         |
99 | Or logic                     | `//a[@name or @href]`                                    | [?](#operators)         |
100 | Union (joins results)        | `//a | //div`                                            | [?](#unions)            |
101 {: .xp}
102
103 <style>
104 /* ensure tables align */
105 table.xp {table-layout: fixed;}
106 table.xp tr>:nth-child(1) {width: 35%;}
107 table.xp tr>:nth-child(2) {width: auto;}
108 table.xp tr>:nth-child(3) {width: 10%; text-align:right;}
109 </style>
110
111 ### Class check
112
113 ```bash
114 //div[contains(concat(' ',normalize-space(@class),' '),' foobar ')]
115 ```
116
117 Xpath doesn't have the "check if part of space-separated list" operator, so this is the workaround ([source](http://pivotallabs.com/xpath-css-class-matching/)).
118
119 Expressions
120 -----------
121
122 ### Steps and axes
123
124 | `//` | `ul` | `/`  | `a[@id='link']` |
125 | Axis | Step | Axis | Step            |
126 {: .-css-breakdown}
127
128 ### Prefixes
129
130 | Prefix | Example               | What     |
131 | ---    | ---                   | ---      |
132 | `//`   | `//hr[@class='edge']` | Anywhere |
133 | `./`   | `./a`                 | Relative |
134 | `/`    | `/html/body/div`      | Root     |
135 {: .-headers}
136
137 Begin your expression with any of these.
138
139 ### Axes
140
141 | Axis | Example             | What       |
142 | ---  | ---                 | ---        |
143 | `/`  | `//ul/li/a`         | Child      |
144 | `//` | `//[@id="list"]//a` | Descendant |
145 {: .-headers}
146
147 Separate your steps with `/`. Use two (`//`) if you don't want to select direct children.
148
149 ### Steps
150
151 ```bash
152 //div
153 //div[@name='box']
154 //[@id='link']
155 ```
156
157 A step may have an element name (`div`) and [predicates](#predicate) (`[...]`). Both are optional.
158 They can also be these other things:
159
160 ```bash
161 //a/text()     #=> "Go home"
162 //a/@href      #=> "index.html"
163 //a/*          #=> All a's child elements
164 ```
165
166 Predicates
167 ----------
168
169 ### Predicates
170
171 ```bash
172 //div[true()]
173 //div[@class="head"]
174 //div[@class="head"][@id="top"]
175 ```
176
177 Restricts a nodeset only if some condition is true. They can be chained.
178
179 ### Operators
180
181 ```bash
182 # Comparison
183 //a[@id = "xyz"]
184 //a[@id != "xyz"]
185 //a[@price > 25]
186 ```
187
188 ```bash
189 # Logic (and/or)
190 //div[@id="head" and position()=2]
191 //div[(x and y) or not(z)]
192 ```
193
194 Use comparison and logic operators to make conditionals.
195
196 ### Using nodes
197
198 ```bash
199 # Use them inside functions
200 //ul[count(li) > 2]
201 //ul[count(li[@class='hide']) > 0]
202 ```
203
204 ```bash
205 # This returns `<ul>` that has a `<li>` child
206 //ul[li]
207 ```
208
209 You can use nodes inside predicates.
210
211 ### Indexing
212
213 ```bash
214 //a[1]                  # first <a>
215 //a[last()]             # last <a>
216 //ol/li[2]              # second <li>
217 //ol/li[position()=2]   # same as above
218 //ol/li[position()>1]   # :not(:first-child)
219 ```
220
221 Use `[]` with a number, or `last()` or `position()`.
222
223 ### Chaining order
224
225 ```bash
226 a[1][@href='/']
227 a[@href='/'][1]
228 ```
229
230 Order is significant, these two are different.
231
232 ### Nesting predicates
233
234 ```
235 //section[//h1[@id='hi']]
236 ```
237
238 This returns `<section>` if it has an `<h1>` descendant with `id='hi'`.
239
240 Functions
241 ---------
242
243 ### Node functions
244
245 ```bash
246 name()                     # //[starts-with(name(), 'h')]
247 text()                     # //button[text()="Submit"]
248                            # //button/text()
249 lang(str)
250 namespace-uri()
251 ```
252
253 ```bash
254 count()                    # //table[count(tr)=1]
255 position()                 # //ol/li[position()=2]
256 ```
257
258 ### Boolean functions
259
260 ```bash
261 not(expr)                  # button[not(starts-with(text(),"Submit"))]
262 ```
263
264 ### String functions
265
266 ```bash
267 contains()                 # font[contains(@class,"head")]
268 starts-with()              # font[starts-with(@class,"head")]
269 ends-with()                # font[ends-with(@class,"head")]
270 ```
271
272 ```bash
273 concat(x,y)
274 substring(str, start, len)
275 substring-before("01/02", "/")  #=> 01
276 substring-after("01/02", "/")   #=> 02
277 translate()
278 normalize-space()
279 string-length()
280 ```
281
282 ### Type conversion
283
284 ```bash
285 string()
286 number()
287 boolean()
288 ```
289
290 Axes
291 ----
292
293 ### Using axes
294
295 ```bash
296 //ul/li                       # ul > li
297 //ul/child::li                # ul > li (same)
298 //ul/following-sibling::li    # ul ~ li
299 //ul/descendant-or-self::li   # ul li
300 //ul/ancestor-or-self::li     # $('ul').closest('li')
301 ```
302
303 Steps of an expression are separated by `/`, usually used to pick child nodes. That's not always true: you can specify a different "axis" with `::`.
304
305 | `//` | `ul` | `/child::` | `li` |
306 | Axis | Step | Axis       | Step |
307 {: .-css-breakdown}
308
309 ### Child axis
310
311 ```bash
312 # both the same
313 //ul/li/a
314 //child::ul/child::li/child::a
315 ```
316
317 `child::` is the default axis. This makes `//a/b/c` work.
318
319 ```bash
320 # both the same
321 # this works because `child::li` is truthy, so the predicate succeeds
322 //ul[li]
323 //ul[child::li]
324 ```
325
326 ```bash
327 # both the same
328 //ul[count(li) > 2]
329 //ul[count(child::li) > 2]
330 ```
331
332 ### Descendant-or-self axis
333
334 ```bash
335 # both the same
336 //div//h4
337 //div/descendant-or-self::h4
338 ```
339
340 `//` is short for the `descendant-or-self::` axis.
341
342 ```bash
343 # both the same
344 //ul//[last()]
345 //ul/descendant-or-self::[last()]
346 ```
347
348 ### Other axes
349
350 | Axis                 | Abbrev | Notes                                            |
351 | ---                  | ---    | ---                                              |
352 | `ancestor`           |        |                                                  |
353 | `ancestor-or-self`   |        |                                                  |
354 | ---                  | ---    | ---                                              |
355 | `attribute`          | `@`    | `@href` is short for `attribute::href`           |
356 | `child`              |        | `div` is short for `child::div`                  |
357 | `descendant`         |        |                                                  |
358 | `descendant-or-self` | `//`   | `//` is short for `/descendant-or-self::node()/` |
359 | `namespace`          |        |                                                  |
360 | ---                  | ---    | ---                                              |
361 | `self`               | `.`    | `.` is short for `self::node()`                  |
362 | `parent`             | `..`   | `..` is short for `parent::node()`               |
363 | ---                  | ---    | ---                                              |
364 | `following`          |        |                                                  |
365 | `following-sibling`  |        |                                                  |
366 | `preceding`          |        |                                                  |
367 | `preceding-sibling`  |        |                                                  |
368 {: .-headers}
369
370 There are other axes you can use.
371
372 ### Unions
373
374 ```bash
375 //a | //span
376 ```
377
378 Use `|` to join two expressions.
379
380 More examples
381 -------------
382
383 ### Examples
384
385 ```bash
386 //*                 # all elements
387 count(//*)          # count all elements
388 (//h1)[1]/text()    # text of the first h1 heading
389 //li[span]          # find a <li> with an <span> inside it
390                     # ...expands to //li[child::span]
391 //ul/li/..          # use .. to select a parent
392 ```
393
394 ### Find a parent
395
396 ```bash
397 //section[h1[@id='section-name']]
398 ```
399 Finds a `<section>` that directly contains `h1#section-name`
400
401 ```bash
402 //section[//h1[@id='section-name']]
403 ```
404
405 Finds a `<section>` that contains `h1#section-name`.
406 (Same as above, but uses descendant-or-self instead of child)
407
408 ### Closest
409
410 ```bash
411 ./ancestor-or-self::[@class="box"]
412 ```
413
414 Works like jQuery's `$().closest('.box')`.
415
416 ### Attributes
417
418 ```bash
419 //item[@price > 2*@discount]
420 ```
421
422 Finds `<item>` and check its attributes
423
424 References
425 ----------
426 {: .-one-column}
427
428 * [Xpath test bed](http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm) _(whitebeam.org)_