Google Search Console – check that your sitemaps don’t include categories (not ecomm), tags or archives or any other weird useless sitemap.
Site: Is the operator to use to check the number of URLs indexed. Make sure you are within 10 -20% of your sitemap number.
How duplication affects crawl budget
On-site duplicate content .
Hey welcome back Rankers! Hey, thanks to everyone who came to the Honey Bar last Monday. It was awesome. We had a great night. Had a lot of discussions with a lot of interesting people. One of the most interesting though I had was with another SEO, I might tell you his name. I knew I was going to have a few full and frank discussions and I wasn’t surprised or disappointed. So I gave my presentation basically the same as what I gave at Pubcon and one of the things I said was … and I showed off how we got Skinnymixers to go from nowhere up to number one for healthy Thermomix recipes. It was basically just cleaning up the index.
And this SEO was saying, “Why did you remove categories and tag pages from the index?” And this guy’s been doing SEO for a while. And he said “But so what?” And I said, “Well, it’s a source of duplication. Which page is Google meant to rank? Is Google meant to rank the tag page, the full article of the blog post or the blog post?” And his comment to me was “Well, then Google would consider it unimportant and therefore it wouldn’t matter, because it’s a tag page.” I said “No, that’s not the case at all.” And I said “What if a user finds that page? How does that help? They’re still …” If a user finds a category page, for instance, and its got a list of blog posts on it, and I’m not talking e-commerce, I’m talking blog or lead-gen content sites here … if a user finds a category page in search, he goes to that category page, they’re still one click away from the post that they want to be at, right? They’ve still got to click on a post. And that’s my point about having potential duplicates. It’s the same as pagination in the index. If you find a paginated blog page like, you know, page 27 of 30 of your blog and it lists all the blog posts on page 27, that’s useless as well, not helpful. And all of these things are best practice, get rid of them, and clean them up, because as soon as you do that, we know it ranks higher. And part of that is that you’ve got internal pages competing against each other when you had duplication.
That’s why I’m constantly surprised; in bloggersSEO we had the same thing. We had members come into the group … incidentally, we’re doing SEO audits three times a week on bloggersSEO, so just on Facebook just search “bloggersSEO,” you’ll find us. If you’re in the US, just search for “bloggers SEO,” we’re number one across the US. And one of the people in there said “No, only external duplication is the problem.” No, internal duplication is a problem. It indicates a low quality site. It indicates a site that isn’t being looked after. It’s why Google has the canonical tag. It’s why we talk about faceted navigation. All of these things are to stop duplication. So make sure, and one of the best things that you can do … and we always start our audits from the browser. So I don’t look at the website, I look at Google first. That’s where I start the audit. And we go “site:domain name.” What content has Google already indexed? Is it the content that you want? If it’s not, get rid of the stuff that shouldn’t be in there and fix up the sites.
We see it time and time again where a strong brand that should be number one isn’t because of basic user experience things. I was just looking at a mattress brand that, for some reason, has gone from a “.com.au” and to go global they’ve decided that a sub-domain and then the brand name .com is the way to go. Big mistake. They should have just gone with the brand name.com/country code/hreflang on all of those. Now they’ve got 30 other sites they’ve got to maintain, and they haven’t got hreflang set up on any of them, so each one of these sites is going to have potential issues, and all they had to do was have one site, make sure there’s no duplication. Now what they’ve done they’ve spread their duplication out across in different languages because they haven’t set up a hreflang properly so Google’s not going to understand what’s going on.
So start your SEO at the index. Understand what Google has actually crawled, make sure it fits within 10% of your site map, usually, 10-20% of your site map, there’s going to be variations, right? Make sure your site map is configured properly, don’t put categories and, if you’re a lead gen site. E-commerce is different, but don’t put your categories and your tags as a site map. I know that you can do that. Don’t. It’s not helpful and it’ll slow down the crawl process and there’s no point to it. You don’t want them indexed, you want them removed from the index.
So, when you are looking at your site, an SEO, don’t think of “Gee, I need more backlinks.” That’s like saying “Gee, I need more applause.” Okay? Then perform, and you’ll get the applause. Build your business, and the backlinks will come. Hopefully, that’s helpful. We’ll see you next week. Thanks very much. Bye.