Need to custom scrape a dynamically rendered (AJAX) website? Then use Apify and *don’t* use Google Sheets
I like using Google Sheets for quick & dirty scraping, and I’ve even built a more advanced and more flexible scraper in Apps Script because it’s really useful.
But ...
The problem #
You can’t use it if you’re trying to fetch data from a dynamically rendered website.
Why? Because ‘dynamic’ means it’s rendered later, and/or in your browser. In other words: when you open the url you’re trying to scrape and parse elements from, some stuff you probably need is not there yet, and scraping that with Sheets/Apps Script will subsequently fail. Leaving you wondering why the hell it didn’t work :p
YET is the magic word.
You need a scraper that can WAIT for an element or even a whole page to become available.
As said, Google Sheets (=IMPORTXML()
) and Apps Script (UrlFetchApp()
) cannot do this.
It is possible to retry requests and use timeouts between requests, but that’s not the same. (Try this if you need that though).
The solution #
So what’s an SEO-automator supposed to do?
Well, use better tools of course!
I’ve been using Apify since the day it started, literally. Me being a fan of it started the same day.
And one of the lesser known, but very useful things it can do, is WAITING for something. As in: a certain element you might need to scrape ;)
The function you need is called page.waitFor()
or context.waitFor()
, depending on the scraper type you use. The parameter of these functions is the selector to select your selected element with. Got it? ;)
See the documentation here: https://docs.apify.com/academy/node-js/waiting-for-dynamic-content. Just try it, it'll help you.
How I use it: a practical example #
For my Content Quality product I custom scrape internal search engines.
Often these internal SERPs are rendered dynamically, because search engines aren’t core business for websites that use them, or even their CMS vendors. So they’re usually loaded from somewhere else.
Look at the sample below: a website with a lot of travel products:
-
-
In the left column (
Raw
), you see what Google Sheets or Apps Scripts sees. I've added the question mark ;) -
-
In the middle column (
Rendered
), you see what eventually is rendered and thus visible on your screen. -
-
In the right column (
Difference
), you see that there are quite a lot of changes. The red lines are removed (they were placeholders before) and the green ones are added.
I was looking for a keyword 'Madrid' in a specific place. It wasn’t there in the Raw version, but it was added later:
P.s.: this screenshot is from using a Chrome plugin called 'View Rendered Source'. Recommended!
So yeah, I needed that :)
There’s no need to make a whole story out of this: Apify does the job. And I advise you to take a look at it, because it rocks.