Redirects chain script for Google Sheets

I developed a Google Apps Script that checks (serverside) redirect chains. Very useful when debugging a lot of redirects!

Of course everybody uses the awesome ScreamingFrog and Sitebulb crawlers for ad hoc scraping. And yes, you can use the 'list' feature in ScreamingFrog for example, to pinpoint exactly which URLs you want to crawl.

But ...

It does have its uses to be able to do comparable things in Google Sheets. Because one thing that Sheets does, and the two mentioned scrapers don't, is that it refreshes itself quite often. I think about every 10 minutes, on average. This is very useful when you just have a list of URLs you want to check automatically and continuously for status codes, redirect chains, or even SSL-errors. You can get busy with fixing the redirects, and Sheets does the checking for you.

I did that last week when I was trying to clean successfully cleaning up the mess of my own old domains, and subdomains, and pages, and old blog-posts, and tools, etc. etc. for the last 20 years. I wanted to redirect them all to my current domains, but also clean up possible SSL-errors (valid or not).

This script helped me a lot with that. I made a list of all domains and URLs that should be checked (I used Ahrefs to get exports of all known pages of all my old domains, and combined them into a single list).

What it looks like

Well, like this:

Redirect chain

How to use

First, you need a URL, and then you can call the outputRedirectChain function. For example, if you have URL in cell A1, call it like this:

=if(isurl(A1);outputRedirectChain(A1);"")

What it does

This script does the following things for you:

  • Duh, it checks for serverside redirects
  • And if it finds one, it repeats itself, until there are no more redirects to be found (so yes, you could crash it by sending it into an infinite loop, but why would you?)
  • It does NOT check for client-side redirects (but you could enhance the script by looking for meta http-equiv="refresh" meta tags, document.location JavaScript, etc., as a double-check after it finds a 200-ok status code)
  • It DOES check for relative URLs, corrects them, and displays a warning about it
  • If it finds a redirect chain, it checks if it's too long. If there are 2, it gives you a warning, and if there are 3 or more, it returns an error.
  • It also warns you if there are multiple types of redirects, for example 302- AND 301-redirects inside the same chain.

What it doesn't

  • It doesn't execute JavaScript. It is not a browser, but a simple scraper. If you need something more powerful, use a crawler that can render JavaScript (like ScreamingFrog, Apify, etc.)
  • It doesn't look for client-side redirects
  • It doesn't automatically write SEO-optimized texts for you
  • It may walk your dog, if you ask politely

The code

Enough talking. Here is the code. In a sheet, copy-paste it into Tools > Script Editor:

// helper function: extract unique values from array. Use it to find different status codes
function getDistincts(value, index, self) {
  'use strict';
  return self.indexOf(value) === index;
}

/**
* Prepare redirect chain, to pass on to a formatting function
* P.s.: it doesn't check client-side redirects. You shouldn't use those anyway.
*
* @param {url} valid start-url
* @return result (array with locations, status codes, remarks, etc.)
*
* @customfunction
*/
function crawlRedirects(url) {
  'use strict';
  var result = [];
  var codes = [];
  var locations = [];
  var remarks = [];
  var count = 0;
  var options = {
    muteHttpExceptions: true,
    followRedirects: false,
  };
  try {
    // trim url to prevent (rare) errors
    url.toString().trim();
    var r = UrlFetchApp.fetch(url, options);
    var c = r.getResponseCode();
    var l = r.getHeaders()['Location'];
    // status code
    codes.push(c);
    // http-location header is defined, so repeat redirect check
    if (l) {
      count++;
      locations.push(l);
      // url starts with slash? change it to full url
      if (l.lastIndexOf('/', 0) === 0) {
        eval(UrlFetchApp.fetch('https://raw.githubusercontent.com/medialize/URI.js/gh-pages/src/URI.js').getContentText()); // load uri.js in cache (works similar to PHP's parse_url() )
        l = URI(url).protocol() + '://' + URI(url).hostname() + l; // add protocol + host to relative url
        remarks.push('Relative location found');
      }
      // recursive repeat
      result = crawlRedirects(l);
    }
    // format
    codes = parseFloat(codes);
    locations = locations.toString();
    remarks = remarks.toString();
    result.unshift([count, codes, locations, remarks]);
  } catch (error) {
    result.push([0, '', error.toString()]);
  } finally {
    return result;
  }
}
/**
* Output a redirect chain, for even nicer Google Sheets presentation in a single row  & multiple columns
*
* @param {url} valid start-url
* @return result single flattened row
*
*/
function outputRedirectChain(url) {
  var chain = crawlRedirects(url);
  var result = [];
  var count = 0;
  var allCodes = []; // 1
  var locations = []; // 2
  var lastLocation = [];
  var remarks = []; // 3  
  var chainCodes = []; // all status codes the chain, excluding the last one
  
  // convert columns in chain array to new array per datatype
  for (var i = 0; i < chain.length; i++) {
    count += chain[i][0];
    allCodes.push(chain[i][1]);
    chainCodes.push(chain[i][1]);
    locations.push(chain[i][2]);
    if(i == (chain.length - 2)) {
      lastLocation.push(chain[i][2]);
    }
    remarks.push(chain[i][3]);
  }
  
  // add warning if redirect chain is 2
  if ((chain.length - 1) == 2) {
    remarks.push('Warning: redirect chain hits 2 redirects. One more, and Googlebot will start ignoring you!');
  }
  
  // Google stops crawling after 3 redirects
  if ((chain.length - 1) >= 3) {
    remarks.push('Error: too many redirects in this chain: ' + chain.length);
  }
  
  if(count > 0) {
  // generic remark: different status codes found, exclude the last one
  chainCodes.pop();
  var uniques = chainCodes.filter(getDistincts);
  if(uniques.length != 1) {
    remarks.push('Warning: different status codes found inside chain');
  }
  }
  
  // filter empty elements and add custom separator
  allCodes = allCodes.filter(Boolean).join(" > ");
  remarks = remarks.filter(Boolean).join(", ");
  lastLocation = lastLocation.filter(Boolean).toString();
  
  result.push([count, allCodes, lastLocation, remarks]);
  return result;
}

Now go on, clean up the internet, one redirect at a time.

What's next?

If you think this was useful to you, it might be for others also. Feel free to share: