URL-segments and Google Sheets

Counting and fetching them in bulk.

If you work with a large number of URLs and need to do advanced analysis on them, you need to be able to divide them into their specific segments.

Below, I will show you a few ways to work with URL-segments, particularly when dealing with a large number of them.

Counting: count the number of slashes in a URL

You can't really directly count the number of occurrences of a specific character (like a slash) in a string (like a URL) with Google Sheets. BUT, there is a workaround:

  1. First, count the length of the string
  2. Next, count the length that would be if a specific character is filtered from it
  3. Calculate the difference

P.s.: I'm assuming you have already found a way to subtract the domain / host from the URL, and have put the paths of the URls in a separate column.

So, if you have a path in cell A1, you would:

  1. First, count the length of the complete path, with LEN(A1)
  2. And with the SUBSTITUTE() function, we can replace slashes with nothing, and then count the length of what remains: (SUBSTITUTE(A1,"/",""))
  3. If you subtract the second result from the first one, you know the amount of slashes in the path in cell A1

The whole formula is:

=LEN(A1)-LEN(SUBSTITUTE(A1,"/",""))

Translated: return the length of what's in cell A1, subtract from it the length of cell A1 but with all "/" stripped from it (substituted by "")

Tip: If you want to apply this formula to many thousands of paths without copying the formula every time, use ARRAYFORMULA() with ISBLANK(): =ARRAYFORMULA(if(ISBLANK(A:A),"", LEN(A:A)-LEN(SUBSTITUTE(A:A,"/",""))))

Fetching URL-segments

If you start searching for 'fetching Url segments' in 'JavaScript' or 'Google Sheets', you will mostly find quite specific regular expressions. Didn't work for me. I had a list of many thousands of URLs/paths, and they were all different. Also, I needed to know all segments of all of them, not just a single segment of a single path.

So, what I did, was to count the number of segments of all of them, and use the MAX() function to find out how many segments the largest one had (MAX() returns the largest number in a range, so I applied it to all segments counts). If I knew that, I knew how many columns I needed to reserve for my next function.

SPLIT() does it what it says: it splits a string into as many parts as it needs, divided by a separator you specify. So if you have a path in cell A1, like /segment1/segment2/page.html, and you use something like SPLIT(A1,"/") in cell B1, the following will happen:

  • Cell B1 holds the value segment1
  • Cell C1 holds the value segment2
  • Cell D1 holds the value page.html

Again, if you want to use this exact function on all paths, put it in a new column (B1), and wrap it in ISBLANK() and ARRAYFORMULA():

=ARRAYFORMULA(if(ISBLANK(A:A),"", SPLIT(A:A,"/")))

After that, you can for example use pivot tables to group the segments and start analysing by segment.

What's next?

If you think this was useful to you, it might be for others also. Feel free to share: