A step-by-step workflow that you can use for your own linkprofile hygiene. Particularly useful if your site has a HUGE number of spammy backlinks and you want to get rid of them in a quick & easy manner.
You do need to know your way around MySQL, Excel and preferably ODBC, so I would say that the difficulty level of this article is 'medium'. I assume that you have a local MySQL database installed and that you know how to use ODBC to fetch data from your database and bring it into Excel, easily, and without errors. It's quite easy to set up though, so go ahead and do it if you haven't done so yet. You can thank me later, because it's useful for a gazillion other things as well.
Here are the 7 steps you need to take to disavow a huge number of spammy backlinks:
Let's look at the steps in more detail.
So you need to have a CSV dump of all backlinks (or a grouping of 'similar links') to your domain. Apply the following steps to your own backlink data provider and situation. The general idea should be clear though. In ahrefs, just go to:
Backlinks > Links > Export.
A download will be prepared, and you can then download it from your browser.
P.s.: Make sure that the file is saved in UTF-8 format. For example, in Safari, I go to:
View > Text Encoding > Unicode (UTF-8).
The easiest tool to work with MySQL on the Mac still is Sequel Pro. If I want to import a CSV file, and tell Sequel Pro that it should be imported into a new table, the program is smart enough to create a new table - with workable column definitions - on the fly. It doesn't need to be perfect; it needs to be quick. Perfect tool for the job!
In the screenshot below, you can see how easy it is:
After all data has been imported into our own database, we need to add some information. Because we have to deal with a huge number of spammy backlinks, we need to disavow complete domains, and not individual links. New spam links are probably being added on a daily basis, and they could be from a gazillion different subdomains and pages. We need to be able to proactively disavow that!
Our action therefore is to enrich the data of ahrefs with two things: 1. which domain a link is from, and 2: if it's a spam link or not. So we actually use a sample of links to find the domains they originate from, and then disavow these domains. This will disavow future spam links from these domains also.
Add a new column to your database. In my case, I've called it 'Referring Page Domain', and gave it the same properties as the 'Referring Page URL' column:
I use this select statement to extract the domain name from a URL in MySQL. Just copypaste it, and change the name of the column 'url' into the one you need to have updated. So in my case, that was 'Referring Page Domain':
UPDATE `NEWTABLENAME` SET `Referring Page Domain` = LEFT( RIGHT( `Referring Page URL` , LENGTH(`Referring Page URL`) -( POSITION('//' IN `Referring Page URL`) + 1 ) ) , POSITION( '/' IN RIGHT( `Referring Page URL` , LENGTH(`Referring Page URL`) -( POSITION('//' IN `Referring Page URL`) + 1 ) ) ) - 1 );
In the status bar below in the query screen in Sequel Pro you can see how many rows have been changed. It should be ALL of them. Look at the data to make sure.
The second transformation we need to make, is to check every single link we have in our dataset, and see if it matches a pattern that we can label as 'spam'. If it does, we add a '1' to a new column. So first, we need to add a new column. In my case I named it 'Spamlink':
Next, we need to update the value of that column from NULL to '1' if it matches a pattern. This is where the data we gathered from ahrefs is really useful! Because we can look at titles, pages links are from, where the link is to, even surrounding text, etc.
In my example, the site of my client was hacked and cloaked pages were added where shoes were sold. They were recognizable by looking at the URLs all the spamlinks point to: previously non-existing pages like
images/?p=, and even URLs that contained product brand names like
nike air. My client does not sell shoes, so it was easily recognized ;)
Anyway, to update the spam flag column, I used the following query:
UPDATE `NEWTABLENAME` SET `Spamlink` = CASE WHEN `Link URL` LIKE '%ugg%' OR `Link URL` LIKE '%nike-air%' OR `Link URL` LIKE '%license.asp%' OR `Link URL` LIKE '%footer.asp%' OR `Link URL` LIKE '%header.asp%' OR `Link URL` LIKE '%images/?p=%' THEN 1 ELSE NULL END;
Ran it. Now I have all backlinks to my clients' domain, every backlink has a domain that I can filter, and every single link is checked whether it is a spamlink or not.
I could continue in MySQL (and should do so, if the amount of data is too big for Excel to handle), but from this point on, I prefer to use the pivot table functionality of Excel. It rocks.
Okay, I'm assuming here that you've already installed an ODBC connector between MySQL and Excel, or at least that you know how to get data from a database into Excel. You can always of course export as CSV from the database, and import it manually.
The trick is, to put it all together into a single Pivot table in Excel that you can slice & dice for easy analysis. In my case, I first chose the domains, and put the URLs they linked to below it (remember: I could recognize spam links by the URL they linked to). Next to it, in the value column, I added both the count and sum of the number of spam links from that domain. If count and sum are equal, 100% of the links from that domain are labeled as spam.
Let's show it from the start. First in Excel, go get the data through ODBC:
Then execute a simple select query to load all data into Excel:
And choose to output it into a pivot table:
The result being a non-configured Pivot table:
In my case, I now needed to add the Referring Domains, the URLs from my clients' domain they linked to, and the count and sum of the number of spammy links.
P.s.: To view only the domains, make sure to collapse
Referring Page Domain field in Excel.
If I sort the Pivot table by
Count of Spamlink > Descending, I can immediately see that many domains have ONLY sent spam links my way, because the count and sum are equal:
Now I have a list of domains of which I am sure that they only send spammy links my way, and will probably remain to do so in the future. In my case, some domains sent multiple thousands of links my way. In other cases, they created a huge number of subdomains that each contained a few spam links. But they were all recognizable by the fact that their links all pointed to the same pages on the domain of my client.
I had to do a little additional housekeeping to remove the many subdomains (simple regex search & replace, then remove duplicates, in a text editor). I just wanted every link from every subdomain of a specific site (literally thousands of subdomains) to be disavowed.
The disavow tool in Google Search Console wants .txt files with UTF-8 encoding. And it is possible to disavow complete domains, in addition to individual links. I prefer to automate, so when i'm confident that a domain only sends shit my way, and I'm also sure that they will not remove their links if I ask them politely (provided I can find the hacker responsible ...), I can safely disavow them completely. Simply add
domain: before a domain name:
Now, submit, and save some time to do other, more important things, like sleeping.
Not much really. But if you think this was useful to you, it might be for others also: