...

Text file src/github.com/PuerkitoBio/goquery/doc/tips.md

Documentation: github.com/PuerkitoBio/goquery/doc

     1# Tips and tricks
     2
     3## Handle Non-UTF8 html Pages
     4
     5The `go.net/html` package used by `goquery` requires that the html document is UTF-8 encoded. When you know the encoding of the html page is not UTF-8, you can use the `iconv` package to convert it to UTF-8 (there are various implementation of the `iconv` API, see [godoc.org][iconv] for other options):
     6
     7```bash
     8$ go get -u github.com/djimenez/iconv-go
     9```
    10
    11and then:
    12
    13```golang
    14// Load the URL
    15res, err := http.Get(url)
    16if err != nil {
    17    // handle error
    18}
    19defer res.Body.Close()
    20
    21// Convert the designated charset HTML to utf-8 encoded HTML.
    22// `charset` being one of the charsets known by the iconv package.
    23utfBody, err := iconv.NewReader(res.Body, charset, "utf-8")
    24if err != nil {
    25    // handler error
    26}
    27
    28// use utfBody using goquery
    29doc, err := goquery.NewDocumentFromReader(utfBody)
    30if err != nil {
    31    // handler error
    32}
    33// use doc...
    34```
    35
    36Thanks to github user @YuheiNakasaka.
    37
    38Actually, the official go.text repository covers this use case too, see its [godoc page][text] for the details.
    39
    40
    41## Handle Javascript-based Pages
    42
    43`goquery` is great to handle normal html pages, but when most of the page is build dynamically using javascript, there's not much it can do. There are various options when faced with this problem:
    44
    45* Use a headless browser such as [webloop][].
    46* Use a Go javascript parser package, such as [otto][].
    47
    48You can find a code example using `otto` [in this gist][exotto]. Thanks to github user @cryptix.
    49
    50## For Loop
    51
    52If all you need is a normal `for` loop over all nodes in the current selection, where `Map/Each`-style iteration is not necessary, you can use the following:
    53
    54```golang
    55sel := Doc().Find(".selector")
    56for i := range sel.Nodes {
    57	single := sel.Eq(i)
    58    // use `single` as a selection of 1 node
    59}
    60```
    61
    62Thanks to github user @jmoiron.
    63
    64[webloop]: https://github.com/sourcegraph/webloop
    65[otto]: https://github.com/robertkrimen/otto
    66[exotto]: https://gist.github.com/cryptix/87127f76a94183747b53
    67[iconv]: http://godoc.org/?q=iconv
    68[text]: https://godoc.org/golang.org/x/text/encoding

View as plain text