How to fetch the Nth segment of a URL

How to segment the path of a URL into its parts. A simple MySQL query to aid you when doing website analysis.

Let's say you have a database table with a *lot* of URLs in it, and you need to segment them for analysis. What you need, is a way to isolate the segments inside a URL.

To achieve this, we can use a built-in MySQL-function two times. First, we need to return the part of the URL that ends with the segment we need. And next, we strip everything before that specific segment. Also, we need to make sure that if a result doesn't exist, MySQL doesn't return the whole string.

Here is how to fetch the Nth segment in a URL in MySQL:

  1. First, use SUBSTRING_INDEX() to divide a string and return the part before a specified number of occurences of a delimiter. In other words: return everything up until the segment we need
  2. Second, we strip everything before that segment.
  3. Also, put an IF-statement around the whole thing to check if a segment actually is there to work with. Otherwise, return an empty string.
  4. Et voila! We have our URL-segment.

Example

If we have a long URL (field name: Page URL), and we want to fetch the sixth segment, we use the following formula:

/* how many segments are there? more than or equal to the segment number we want? Then there is something 'there' to work with */
IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 9,
    /* so return its value */
    SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 9), '/', -1),
    /* or else return an empty string */
    ''
) AS segment6

The following happens here:

  • the IF-statement around the whole things checks how many segments there are. If it is bigger than or the same as the segment number we want, we know we can fetch something
  • if TRUE, fetch the text after 'slash number X' (where X is the number of the segment)
  • if FALSE, return an empty string

P.s.: the first segment starts at a count of 4 slashes: a count starts at 0, and then come 2 slashes of the protocol (https://), so we've already had three when we encounter the first segment we want. So if I would do this for the URL of this very article, which is https://eikhart.com/blog/fetch-nth-segment-url, the fourth segment (which I will name 'segment1', but it actually is the fourth) should be blog, and the fifth should be fetch-nth-segment-url.

For your convenience, I've pre-made a few examples of the query for longer URLs, up to 6 segments. If you test it on the URL of this blog article, you will see that only the first two return something (which they should), and that the rest is empty:

,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 4, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 4), '/', -1), '') AS segment1 /* 'blog' */
,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 5, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 5), '/', -1), '') AS segment2 /* 'fetch-nth-segment-url' */
,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 6, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 6), '/', -1), '') AS segment3 /* empty */
,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 7, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 7), '/', -1), '') AS segment4 /* empty */
,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 8, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 8), '/', -1), '') AS segment5 /* empty */
,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 9, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 9), '/', -1), '') AS segment6 /* empty */

This way, it is far easier to segment all URLs of a website for further analysis.

What's next?

If you think this was useful to you, it might be for others also. Feel free to share: