EVOLUTION-MANAGER
Edit File: html_element.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Select elements from an HTML document</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for html_element {rvest}"><tr><td>html_element {rvest}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Select elements from an HTML document</h2> <h3>Description</h3> <p><code>html_element()</code> and <code>html_elements()</code> find HTML element using CSS selectors or XPath expressions. CSS selectors are particularly useful in conjunction with <a href="https://selectorgadget.com/">https://selectorgadget.com/</a>, which makes it very easy to discover the selector you need. </p> <h3>Usage</h3> <pre> html_element(x, css, xpath) html_elements(x, css, xpath) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p>Either a document, a node set or a single node.</p> </td></tr> <tr valign="top"><td><code>css, xpath</code></td> <td> <p>Elements to select. Supply one of <code>css</code> or <code>xpath</code> depending on whether you want to use a CSS selector or XPath 1.0 expression.</p> </td></tr> </table> <h3>Value</h3> <p><code>html_element()</code> returns a nodeset the same length as the input. <code>html_elements()</code> flattens the output so there's no direct way to map the output to the input. </p> <h3>CSS selector support</h3> <p>CSS selectors are translated to XPath selectors by the <span class="pkg">selectr</span> package, which is a port of the python <span class="pkg">cssselect</span> library, <a href="https://pythonhosted.org/cssselect/">https://pythonhosted.org/cssselect/</a>. </p> <p>It implements the majority of CSS3 selectors, as described in <a href="https://www.w3.org/TR/2011/REC-css3-selectors-20110929/">https://www.w3.org/TR/2011/REC-css3-selectors-20110929/</a>. The exceptions are listed below: </p> <ul> <li><p> Pseudo selectors that require interactivity are ignored: <code style="white-space: pre;">:hover</code>, <code style="white-space: pre;">:active</code>, <code style="white-space: pre;">:focus</code>, <code style="white-space: pre;">:target</code>, <code style="white-space: pre;">:visited</code>. </p> </li> <li><p> The following pseudo classes don't work with the wild card element, *: <code style="white-space: pre;">*:first-of-type</code>, <code style="white-space: pre;">*:last-of-type</code>, <code style="white-space: pre;">*:nth-of-type</code>, <code style="white-space: pre;">*:nth-last-of-type</code>, <code style="white-space: pre;">*:only-of-type</code> </p> </li> <li><p> It supports <code style="white-space: pre;">:contains(text)</code> </p> </li> <li><p> You can use !=, <code style="white-space: pre;">[foo!=bar]</code> is the same as <code style="white-space: pre;">:not([foo=bar])</code> </p> </li> <li> <p><code style="white-space: pre;">:not()</code> accepts a sequence of simple selectors, not just a single simple selector. </p> </li></ul> <h3>Examples</h3> <pre> html <- minimal_html(" <h1>This is a heading</h1> <p id='first'>This is a paragraph</p> <p class='important'>This is an important paragraph</p> ") html %>% html_element("h1") html %>% html_elements("p") html %>% html_elements(".important") html %>% html_elements("#first") # html_element() vs html_elements() -------------------------------------- html <- minimal_html(" <ul> <li><b>C-3PO</b> is a <i>droid</i> that weighs <span class='weight'>167 kg</span></li> <li><b>R2-D2</b> is a <i>droid</i> that weighs <span class='weight'>96 kg</span></li> <li><b>Yoda</b> weighs <span class='weight'>66 kg</span></li> <li><b>R4-P17</b> is a <i>droid</i></li> </ul> ") li <- html %>% html_elements("li") # When applied to a node set, html_elements() returns all matching elements # beneath any of the inputs, flattening results into a new node set. li %>% html_elements("i") # When applied to a node set, html_element() always returns a vector the # same length as the input, using a "missing" element where needed. li %>% html_element("i") # and html_text() and html_attr() will return NA li %>% html_element("i") %>% html_text2() li %>% html_element("span") %>% html_attr("class") </pre> <hr /><div style="text-align: center;">[Package <em>rvest</em> version 1.0.3 <a href="00Index.html">Index</a>]</div> </body></html>