Skip to content Skip to sidebar Skip to footer

How To Go To The Next Page For Scraping In Phantomjs

I'm trying to get several elements from a website with several pages. I'm currently using PhantomJS to do that work and my code almost works, but the issue is that my code scrapes

Solution 1:

You need to wait for the page to load after you click and not before you click by moving setTimeout() from fetch_names to goto_next_page:

functionfetch_names(){
    var name = page.evaluate(function () {
        return [].map.call(document.querySelectorAll('div.pepitesteasermain h2 a'), function(name){
            return name.getAttribute('href');
        });
    });
    console.log(name.join('\n'));
    page.render('1.png');
    goto_next_page();
}

functiongoto_next_page(){
    page.evaluate(function () {
        var a = document.querySelector('#block-system-main .next a');
        var e = document.createEvent('MouseEvents');
        e.initMouseEvent('click', true, true, window, 0, 0, 0, 0, 0, false, false, false, false, 0, null);
        a.dispatchEvent(e);
        waitforload = true;

    });
    window.setTimeout(function (){
        fetch_names();
    }, 5000);
}

Note that there are many more ways to wait for something other than the static timeout. Instead, you can

  • register to the page.onLoadFinished event:

    page.onLoadFinished = fetch_names;
  • wait for a specific selector to appear with the waitFor() function from the examples.

Post a Comment for "How To Go To The Next Page For Scraping In Phantomjs"