Video Transcript – 4 steps to eradicating SEO noise

4 steps to eradicating SEO noise – original post here

Welcome back Rankers and we’re doing another SEO show. I’ve got people live on periscope here and Donny is just on, “Hi, from Jakarta.” I think it says Jakarta. My iPhone screen is a little cracked so it’s hard to tell half the time. But I’m pretty sure it said Jakarta. Say hello to all the…there’s all the people on periscope. Not that you can see them, but that’s the people on periscope, there you go. So today, I want to talk to you about cleaning up the index.

Because I talk about it all the time and this week I’ve been doing a lot of site orders, where you go and have a look at the site and work out what’s wrong with it and we’ve seen a bunch of sites start to do…working now, yeah thanks Donny. Sorry periscope people. And we’ve been talking to a lot of other clients who want to do site migration and they go, “Oh, we’re going to…well by the way we’re going to push our new site live.” So it’s like, do you not watch anything I publish? And the site migration stuff is…I did a show on it about three weeks ago, I think.

Go and have a look on the blog. But that’s when we see a lot of clients or a lot of sites just disappear from Google Rankings, lose all their rankings and one of the things that they’re doing wrong is not cleaning up the index before they push their new site live. So that basically what happens is they end up pushing their new site live out on shaky search foundations, if that makes sense. So to give you an idea of that, this is a site that I’m working on for the, our bloggers SEO product who we’ve just about started beta testing for. I think beta tester invitations went out today. You may still be able to get on the list.

Just go to bloggersseo.com or bloggersseo.com.au and fill out the beta request there and we’ll see if there’s room for another beta tester.

But basically, it’s a SEO system for bloggers. So people who don’t necessarily want to know all the technical ins and outs of SEO, but just want to know how to make it work on their WordPress site because they blog. So fairly niche, but this is one of the sites that I’ve been working on and I did a show a couple of weeks ago about how I was implementing a few tweaks on this. Anyway, the keyword I think at the time that I was looking at for this particular site was number six or seven. She’s gone up to number two since I made those changes and now I’m looking at things, where can we get that little bit of an edge once we’re number two how do we get to number one and it’s all the little things that count when you get to that level, and so I thought I need to revisit the Google Index again and one of the things that we’ve discovered over time with the Google Index, when you do site:, like I’ve done here and it tells me…well it’s actually telling me here 328 results.

But that is when we’re on page four. If I was on page one of this result, Google’s telling me 634 results. Now those 634 results, I’ve got that down from about 1300. So we’ve halved the amount of results that were in the index and the reason we’re trying to get that result smaller, that search result smaller is because when we have a look at the sitemap, the sitemap is generated by WordPress. So basically looking at everything that’s on WordPress and then putting it all into sitemap format. It actually has 331 posts. That’s what the post sitemap is here and then we’ve got another 31, no I’m sorry another 19 pages in the pages sitemap.

So basically, the site should have about 350 pages. So 634 is still too big. “Hi Felix.” And so I’m going to have a look at the index data so you can see here that before this site…I started working on this site, it didn’t have web master tools setup. So we setup web master tools and right away web master tools said, “Well we think you’ve got 1300 pages and that’s how many we’ve crawled and indexed.” We made some changes and you can see it has come right down and it’s going further and further and further down.

But it still has 600 somewhere…over 600. So I need to find those extra 300 pages. My phones on the desk here and every time I go like this, I just worry that everyone on periscope is just looking up my nose. Sorry about that periscope. Hopefully I shaved my nose this morning, I think I did. So if we go to have a look at the index, then we’ve got to work out well of these 630 pages, what are all these extra pages? One of the things that we found is that you can go and remove stuff one day. You might say, have 100 pages, you remove ten you get down to 90 and then the following day you have 93.

Google almost has, in some respect, there’s probably a name for it, but a secondary index where what you see when you do this site colon search isn’t actually everything that Google has in it. It’s major index or it’s B-index or it’s…there’s, anyway a secondary index. You don’t need to get into the why’s and wherefores’ and the technicalities of it. But just know that the results that you see there aren’t necessarily the final results once you start removing things. Because I’ve seen this go up and down about three or four times and there’s things that can influence that. Like different data centres where the information’s coming from.

Anyway, what I want to do is find out what is left in the index that I have to get out. Because going through 634 results one-by-one is quite painful as you can imagine. So one of the tools I’ve been using for years. I don’t use it that much anymore. But I do use it for this, which is scraping results out of Google so, getting all the results out of Google. So one of the ways I do it is I use Firefox to do this because this is a tool from Aaron Wall from SEO books. So just go to tools.seobook.com you’ll be able to find this tool. I think it’s called SEO for Firefox.

It gives you all this neat stuff down here. I’ll talk to you about that another time when we’re doing another tools episode. But what you do is, you go into your Google settings, your search settings and you set your results to 100. So if you don’t know how to do that then you need to work a bit harder. Because that’s really simple to do. So I’ve got my results here set to 100 and then I can use Aaron’s tool to simply go, “Oh, grab me a CSV.” And it will create a CSV of those results. Now I’ve actually created four CSV’s for this exercise to get all the results out of Google and then I’ve put them in to…well I’ve just opened them up in Excel really and then I’ve said sort by ascending order.

So I can see everything alphabetically and the reason I do that is so I can see similar URL’s. But also so I can see duplicate URL’s and also that I get any query URL’s. Because I know for this site I do not want any query URL’s appearing in the search, and so what I’ve done here is sorted by that and right up here I can see, there’s one, two, three, there’s twelve, well right up to fifteen. Well there’s fourteen URL’s right there that I can remove today. So then what I would do is…what’s that 2029? I would think I’d back over to Google Search Console and have a look at the remove URL. So once again, that’s under Google Index and remove URL’s.

So go to remove URL’s and you’ll notice now…this used to say remove, now it just says temporarily hide. Because basically if you use this tool and it’s called temporarily remove URL’s. Because unless you’ve blocked it or no-indexed it or put some other sort of tag into it, then Google will just come along and recall it and re-index it again and it doesn’t stop it being permanently removed from the index unless you take other precautions. Either blocking the robots.text or no-indexing it or having a canonical tag on the URL and Google won’t re-index that URL.

So, and you can see here I’ve already removed a bunch. So I’ll just grab that, let me just grab that promptly, grab that and you just simply pop it in there and then Google will schedule it for removal. Now if you want to hold a directory remove, say you’ve got a whole major directory that’s appeared in the search engine results. You want that whole thing removed. Then you have to actually go and block that directory in robots.text to stop Google going in there in the first place and then you go and remove those. So I’ll go and remove all those, we’ll see it drop down even more and we’ll get a better picture or Google will have a better picture of what the sites really about.

So there’s obviously another couple a hundred URL’s in these spreadsheets that I’ve created that I need to go and find what shouldn’t be in there. My guess is that it is directories that I have marked as no-index but Google hasn’t gone back to those yet. So you can go and set up no-index, set up robots.text, all those things. But until Google comes back and finds that out it won’t take any action on whether that URL should be in the index at all, and look this is a problem that we see all the time. Every day I see this problem and usually it’s worse with E-commerce sites. Mainly because they might have stock that’s been discontinued or they’ve got rid of.

But the URL’s, the old URL’s are still in there. You need to have a policy in place for handling discontinued stock and also pages that are going to be unpublished. You need to have a redirection strategy or something like that. So Google knows and can remove them from the index.

So that is it for this week’s show. Hopefully that’s helpful. We’re going to have a chat now on Blab and do some Q&A and we’ll see you next week. Thanks very much. Bye.