How to fetch the Nth segment of a URL, in MySQL

How to segment the path of a URL into its parts. A simple MySQL query to aid you when doing website analysis.

2 min.

This is quite cumbersome in MySQL. Here’s something that might help.

Let's say you have a database table with a *lot* of URLs in it, and you need to segment them for analysis. What you need, is a way to isolate the segments inside a URL.

To achieve this, we can use a built-in MySQL-function two times. First, we need to return the part of the URL that ends with the segment we need. And next, we strip everything before that specific segment. Also, we need to make sure that if a result doesn't exist, MySQL doesn't return the whole string.

Here is how to fetch the Nth segment in a URL in MySQL:

  1. 1.

    First, use SUBSTRING_INDEX() to divide a string and return the part before a specified number of occurences of a delimiter. In other words: return everything up until the segment we need

  2. 2.

    Second, we strip everything before that segment.

  3. 3.

    Also, put an IF-statement around the whole thing to check if a segment actually is there to work with. Otherwise, return an empty string.

Example

If we have a long URL (field name: Page URL), and we want to fetch the sixth segment, we use the following formula:

/* how many segments are there? more than or equal to the segment number we want? Then there is something 'there' to work with */
IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 9,
    /* so return its value */
    SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 9), '/', -1),
    /* or else return an empty string */
    ''
) AS segment6

The following happens here:

  • -

    the IF -statement around the whole things checks how many segments there are. If it is bigger than or the same as the segment number we want, we know we can fetch something

  • -

    if TRUE, fetch the text after 'slash number X' (where X is the number of the segment)

  • -

    if FALSE, return an empty string

For your convenience, I've pre-made a few examples of the query for longer URLs, up to 6 segments. If you test it on the URL of this blog article, you will see that only the first two return something (which they should), and that the rest is empty:

,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 4, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 4), '/', -1), '') AS segment1 /* 'possibly-useful-article-about' */
,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 5, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 5), '/', -1), '') AS segment2 /* 'fetch-nth-segment-url' */
,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 6, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 6), '/', -1), '') AS segment3 /* empty */
,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 7, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 7), '/', -1), '') AS segment4 /* empty */
,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 8, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 8), '/', -1), '') AS segment5 /* empty */
,IF(LENGTH(`Page URL`) - LENGTH(REPLACE(`Page URL`, '/', '')) + 1 >= 9, SUBSTRING_INDEX(SUBSTRING_INDEX(`Page URL`, '/', 9), '/', -1), '') AS segment6 /* empty */

This way, it is far easier to segment all URLs of a website for further analysis.

→ Call to action ←

Sounds interesting?

Then let's make the virtual gap between us a little bit shorter!