Get HTTP-status codes (301,200, 404, etc.)

With Google Sheets & Google Apps scripting

It is possible in Google Sheets to obtain the http-status codes (200, 301, 302, 404, etc.) for a large number of URLs. 'Large' meaning: up to a few thousand, not millions. This copy & paste script will do the heavy lifting for you.

The code

So, yapping later, the code first. This is the script you need to use. In Google Sheets, go to 'Extra' > 'Script Editor' and replace the placeholder code ('my function') with this one and save it:

/**
 * Return status code or error for a given url. Tip: wrap the call around the Google Sheets =isurl() function to make sure that only valid URLs are checked, and the =unique() function to make sure every unique URL is checked only once. See https://eikhart.com/blog/google-sheets-http-status-codes for an in-depth explanation.
 *
 * @param {url} valid url
 * @return result (status code or error message)
 *
 * @customfunction
 */
function httpstatuscode(url) {
    var result = [];
    // trim url to prevent (rare) errors
    url.toString().trim();
    // don't break if error, but instead return it as a message, also don't follow redirects
    var options = {
        'muteHttpExceptions': true,
        'followRedirects': false,
    };
    // int - returns status code
    try {
        result.push(UrlFetchApp.fetch(url, options).getResponseCode());
    }
    // string - or returns an error (usually 'DNS error' if domain does not exist)
    catch (error) {
        result.push(error.toString());
    } finally {
        return result;
    }
}

Use it on unique and valid URLs, like this:

=httpstatuscode(url)

In essence, you're done, but, I will show you how you can take it a few steps further, and use it in bulk.

P.s.: there are several scripts around on the web that aim to find status codes, but the best one I found is this one. The author clearly explains the how's & why's of this specific script, so there's no need for me to repeat it. Let me just expand it a little bit below.

Limitations

I tested this script with approx. 4000 URLs (containing many duplicates and non-valid URLs) and it worked flawlessly. Took a while though, but it worked without hiccups. However, there are limitations to the amount of requests you can do with Google Sheets in a day, so I wouldn't use this script for much more URLs than a few thousand.

Expanding it: using it for 2500+ messy, invalid, non-unique URLs

To use this custom function as efficiently as possible, you need to make sure that a few conditions are met:

  • There is *something* to check in the first place. I do that by checking if the length of a cell is higher than 0. I do NOT use =if(isblank()), because that also matches cells that contain a formula with an empty string as result
  • Only use it on valid URLs. That's why you should wrap the function around the =isurl() function of Google Sheets. Now we have our 'seed list'.
  • Make sure that every request is unique. I do this by creating a separate column that only lists all unique values. This is the 'unique list' for which the script is fired.
  • And after that I use a lookup to match the unique results with the (possibly non-unique) seed list of URLs. I actually use =index(match()), because that is more flexible than a =vlookup().

So, how to actually use all these criteria?

Step 1: create a list of unique URLs

First, make a new column with the unique URLs. Let's say the complete 'seed list' of URLs is in column A, starting at row 2. In cell B2 I could use the following:

=unique(filter(A2:A;A2:A <> ""))

This will fill column B from row 2 downwards with all unique values found in column A, skipping every cell that is empty.

Step 2: call the script

Next, I check if they are valid URLs, and if so, make the call. Copy paste the formula below in cell C2 and copy paste it downwards to all cells in column C:

=if(isurl(B2);httpstatuscode(B2);"Invalid URL")

Note that I don't use =arrayformula() or batching in an apps script. More on that below.

Step 3: match results with initial URL seed list

The last step is to match these results (the found status codes and/or error messages for the unique URLs) against the initial seed list of URLs in column A2:A.

The returned status codes or error messages are in C2:C, the unique URLs in B2:B, and we need to match it against the seed URLs in A2:A.

A wise man

If you know for certain that the original list only contains unique URLs, you can of course skip this step.

Matching the results of one column with another can be done in different ways, and most people do know about =vlookup(). BUT, =vlookup() isn't flexible enough to deal with things like sorting or filtering, so I usually use the combination of =index() and =match(). It's a bit more difficult to use, but more flexible, and doesn't break so easily as =vlookup(). So use the following formula:

=if(isblank(A2);"";iferror(index(C2:C;match(A2;B2:B;0));"Waiting ..."))

Let's break it down:

  • =if(isblank(A2);""; checks if we should do anything at all. If the requested cell does not contain anything, just return an empty string (""), else continue;
  • the =iferror()-part is more for aesthetics and less for functionality, because the script does a pretty good job of always returning something useful.
    But why use it then? For instance: you will see the message in the cell ('Waiting ...') when the script is basically waiting in line until it can fire. So this function gives people a nicer 'Waiting ...' message instead of an error when it's not executed yet (remember, if you have a lot of URLs, this could take a while).
  • why and how to use index(match) is beyond the scope of this article (this one does a good job explaining it), but the goal is to match the status code of a URL in the 'unique' list with its equivalent in the original 'seed list' of (possible) non-unique URLs. This will make sure that it does :)

So translated into non-tech language: "if cell A2 is blank, return an empty result (""). Or else, if there is *something*, make sure we see a nice message while we're waiting for the script to fetch the status code we need (iferror()). Next, use the response of the script, which is in column C (index(C2:C)), where the corresponding URL in column B (match(B2:B)) has the same value as A2".

Why not use =arrayformula() or Google Apps Scripts batch processing?

I tried batch processing within an Apps script (then the input of the script would be a range, and I could use a for-loop and fetchAll()) and with =arrayformula(). It's quite possible I did something wrong, whereas simply copying the formula below, did work for me. I suspect that with =arrayformula() or batch processing there are too many requests at once, whereas in this manner, Google Sheets will behave a bit less aggressive. Who knows. Anyway, I did look into it in case you're wondering ;)

What's next?

If you think this was useful to you, it might be for others also. Feel free to share: