Future-proof domain disavowing in bulk

A step-by-step workflow that you can use for your own linkprofile hygiene. Particularly useful if your site has a HUGE number of spammy backlinks and you want to get rid of them in a quick & easy manner.

You do need to know your way around MySQL, Excel and preferably ODBC, so I would say that the difficulty level of this article is 'medium'. I assume that you have a local MySQL database installed and that you know how to use ODBC to fetch data from your database and bring it into Excel, easily, and without errors. It's quite easy to set up though, so go ahead and do it if you haven't done so yet. You can thank me later, because it's useful for a gazillion other things as well.

Here are the 7 steps you need to take to disavow a huge number of spammy backlinks:

  1. Extract all your backlinks from SEMrush, ahrefs, etc.
  2. Import the links into a local database for easy querying
  3. Transform the data: add the domain name to every link
  4. Transform the data: flag spamlinks
  5. Import into Excel with ODBC, and create a pivot table for your final check
  6. Add the 'domain:' prefix to the found spammy domains in Excel
  7. Save as UTF-8 text file, and submit in Google Search Console


Let's look at the steps in more detail.

Step 1: extract all your backlinks from ahrefs, majestic (I use ahrefs)

So you need to have a CSV dump of all backlinks (or a grouping of 'similar links') to your domain. Apply the following steps to your own backlink data provider and situation. The general idea should be clear though. In ahrefs, just go to: Backlinks > Links > Export.

A download will be prepared, and you can then download it from your browser.

P.s.: Make sure that the file is saved in UTF-8 format. For example, in Safari, I go to: View > Text Encoding > Unicode (UTF-8).

Safari Text Encoding Utf 8

Step 2: import the links into a local database for easy querying

The easiest tool to work with MySQL on the Mac still is Sequel Pro. If I want to import a CSV file, and tell Sequel Pro that it should be imported into a new table, the program is smart enough to create a new table - with workable column definitions - on the fly. It doesn't need to be perfect; it needs to be quick. Perfect tool for the job!

In the screenshot below, you can see how easy it is:

  1. Just give the new table a name
  2. No need to worry about column definitions: Sequel pro makes solid default suggestions. No need to change anything here.
  3. Hit 'import'. Done!
Sequel Pro Easy Import Csv

After all data has been imported into our own database, we need to add some information. Because we have to deal with a huge number of spammy backlinks, we need to disavow complete domains, and not individual links. New spam links are probably being added on a daily basis, and they could be from a gazillion different subdomains and pages. We need to be able to proactively disavow that!

Our action therefore is to enrich the data with two things: 1. which domain a link is from, and 2: if it's a spam link or not. So we actually use a sample of links to find the domains they originate from, and then disavow these domains. This will disavow future spam links from these domains also.

Step 3: transform the data: add the domain name to every link

Add a new column to your database. In my case, I've called it 'Referring Page Domain', and gave it the same properties as the 'Referring Page URL' column:

Menubar And  Mysql 5 7 17  Local Meuk Newtablename And Site

I use this select statement to extract the domain name from a URL in MySQL. Just copypaste it, and change the name of the column 'url' into the one you need to have updated. So in my case, that was 'Referring Page Domain':

SET `Referring Page Domain` = LEFT(
		`Referring Page URL` ,
		LENGTH(`Referring Page URL`) -(
			POSITION('//' IN `Referring Page URL`) + 1
	) ,
		'/' IN RIGHT(
			`Referring Page URL` ,
			LENGTH(`Referring Page URL`) -(
				POSITION('//' IN `Referring Page URL`) + 1
	) - 1

In the status bar below in the query screen in Sequel Pro you can see how many rows have been changed. It should be ALL of them. Look at the data to make sure.

Step 4: transform the data: flag spamlinks

The second transformation we need to make, is to check every single link we have in our dataset, and see if it matches a pattern that we can label as 'spam'. If it does, we add a '1' to a new column. So first, we need to add a new column. In my case I named it 'Spamlink':

Screen Shot 2017 11 09 At 10 31 19

Next, we need to update the value of that column from NULL to '1' if it matches a pattern. This is where the data we gathered from SEMrush is really useful! Because we can look at titles, pages links are from, where the link is to, even surrounding text, etc.

In my example, the site of my client was hacked and cloaked pages were added where shoes were sold. They were recognizable by looking at the URLs all the spamlinks point to: previously non-existing pages like header.asp, footer.asp, images/?p=, and even URLs that contained product brand names like ugg and nike air. My client does not sell shoes, so it was easily recognized ;)

Anyway, to update the spam flag column, I used the following query:

SET `Spamlink` = CASE
WHEN `Link URL` LIKE '%ugg%'
OR `Link URL` LIKE '%nike-air%'
OR `Link URL` LIKE '%license.asp%'
OR `Link URL` LIKE '%footer.asp%'
OR `Link URL` LIKE '%header.asp%'
OR `Link URL` LIKE '%images/?p=%' THEN

Ran it. Now I have all backlinks to my clients' domain, every backlink has a domain that I can filter, and every single link is checked whether it is a spamlink or not.

Step 5: import into excel with ODBC (or manually), and create a pivot table for your final check

I could continue in MySQL (and should do so, if the amount of data is too big for Excel to handle), but from this point on, I prefer to use the pivot table functionality of Excel. It rocks.

Okay, I'm assuming here that you've already installed an ODBC connector between MySQL and Excel, or at least that you know how to get data from a database into Excel. You can always of course export as CSV from the database, and import it manually.

The trick is, to put it all together into a single Pivot table in Excel that you can slice & dice for easy analysis. In my case, I first chose the domains, and put the URLs they linked to below it (remember: I could recognize spam links by the URL they linked to). Next to it, in the value column, I added both the count and sum of the number of spam links from that domain. If count and sum are equal, 100% of the links from that domain are labeled as spam.

Let's show it from the start. First in Excel, go get the data through ODBC:

Excel Odbc New Query

Then execute a simple select query to load all data into Excel:

Excel Odbc Run Query

And choose to output it into a pivot table:

Excel Odbc Put Into Pivot

The result being a non-configured Pivot table:

Excel Odbc Empty Pivot

In my case, I now needed to add the Referring Domains, the URLs from my clients' domain they linked to, and the count and sum of the number of spammy links.

P.s.: To view only the domains, make sure to collapse Referring Page Domain field in Excel.

If I sort the Pivot table by Count of Spamlink > Descending, I can immediately see that many domains have ONLY sent spam links my way, because the count and sum are equal:

Excel Odbc Pivot Settings

Now I have a list of domains of which I am sure that they only send spammy links my way, and will probably remain to do so in the future. In my case, some domains sent multiple thousands of links my way. In other cases, they created a huge number of subdomains that each contained a few spam links. But they were all recognizable by the fact that their links all pointed to the same pages on the domain of my client.

I had to do a little additional housekeeping to remove the many subdomains (simple regex search & replace, then remove duplicates, in a text editor). I just wanted every link from every subdomain of a specific site (literally thousands of subdomains) to be disavowed.

Step 7: save as UTF-8 text file, and submit in Google Search Console

The disavow tool in Google Search Console wants .txt files with UTF-8 encoding. And it is possible to disavow complete domains, in addition to individual links. I prefer to automate, so when i'm confident that a domain only sends shit my way, and I'm also sure that they will not remove their links if I ask them politely (provided I can find the hacker responsible ...), I can safely disavow them completely. Simply add domain: before a domain name:

Disavow Example

Now, submit, and save some time to do other things :)

What's next?

If you think this was useful to you, it might be for others also. Feel free to share: