Magento Indexing Issue Solved

by Jim April 4, 2018

We’ve been wrestling with a Magento indexing issue for a particular client of late. Solving it has been a major headache, but we think we’ve found the solution with the help of the new Google Search Console.

What I learned

• The new Google Search Console is a great tool for discovering index issues

• Soft 404s aren’t always accurate

• The Googlebot can get stuck in a redirect loop

• It’s a specific Magento problem

• How to solve a user agent verification issue


Hey, welcome back Rankers. Did you have a good Easter? We did. I pulled down a large shed, which was a lot of hard work. .AU stuff is still going on. So if you’re not aware, .AU is a big cash grab. Basically every business in Australia is gonna be forced to pay a bunch of money to the AU Domain Administrator. So if you want to stop that, just Google “Stop AU cash grab” and you will find the petition which you can sign. There’s more coming on that, and there’ll be more publicity in the coming weeks. Right now, that’s all I can say about it. Utilising the New Search Console I wanted to talk to you today a little bit about the Google index. Now we know that the mobile-first index is coming and we’ve covered that in the past. It’s being rolled out as we speak, so watch out for an announcement in your Google Search Console. And speaking of Google Search Console, if you want a nice April fool, go and have a click on that. Good one, Google. Re-crawl now. Yep. Have a play with that. Search Console though, and the new Search Console is just awesome for finding out things about the index that you didn’t know previously. For this particular client that we’ve got here, we’ve had a whole bunch of issues.

One of the main issues is being able to get them indexed properly, and you can see here that we’ve got 3,500 errors, we’ve got only 297 pages valid, and we’ve had 35,000 excluded. When you go down and have a look at some of these, you can see … You know, 2,000 redirect errors … But down here we’ve got 24,000 soft 404s. Now these actually aren’t soft 404s. A soft 404 is basically Google thinking that what you meant to do was get the server to respond with a 404 error code, page not found, but what you’ve actually done is redirected.

That’s what’s actually happening in this situation. Pages, or the Googlebot specifically, is being redirected and Google is interpreting that as a soft 404, which is a big problem, and there’s been heaps of issues trying to get this site indexed properly. So, what we worked out last week was that it’s just the Googlebot being redirected. It’s not you and I. So if you and I go and request a page, the server goes, “Yep, here’s the page, no problem.” If the Googlebot came along and requested a page, it would redirect it temporarily, 302, to the home page, and then the home page would be redirecting in on itself only for the Googlebot. But not all Googlebots. Only one Googlebot, the main desktop Googlebot. A Unique Magento Problem Now trying to find that out and isolating that was a bit of a trick, but the next step in that process was … It blew my mind a little bit, ’cause I thought, how many Magento sites out there are doing this? ‘Cause this is a specific Magento problem, from what I’ve been able to see.

If you’ve been able to isolate this somewhere else, then please let me know. But basically it’s called a user agent validation issue, or user agent verification issue, and was basically the Magento content management system and part of that looking at the requests from the different user agents out there, and then trying to make sure or verify them against known user agents in their database. It was used for a variety of things in the past. I think mainly for design purposes to see what device was coming and all the rest of it. But there’s better ways of doing that now, right? But this was still doing it, and so what this was doing … You can have a look here. I’ve just got … I am actually using cURL here, or capture URL, for OSX, and I’m saying here, “Use this particular user agent.” So I’m saying, “Mimic the Googlebot”, and I’m saying “Just show me any redirects.” So I do that and you can see here we’ve got a 302. But we’ve been 302’d to the home page, and the home page is 302ing as well. So we’re getting this endless loop.

So then what we did, we said okay, well that’s one bot, if we hit it with one bot. But what happens if I hit it with, say, a different Googlebot? So what I did, we went and got the user agent information for the Google mobile bot, or one of them, the smartphone bot, and then I hit it with that. Basically what happened was that, it was just the desktop bot that it was having problems with. The mobile bot, it was fine. But in addition to that, and this was one of the problems that we found with this site, is that it was an intermittent problem. Sometimes it would redirect the Googlebot and other times it wouldn’t. What we found was that once we hit it with the mobile bot, the mobile bot would come back with 200 OK, as you can see there. Then we would go back and hit the same page again with the desktop bot after we’d just hit it with the mobile bot, and … This right? The bot that was getting redirected, all of a sudden now is a 200 OK. So we fixed the redirect problem temporarily with another Google bot, and basically all that needs to happen if you have this problem in Magento is switch off user agent verification. So if you’ve got an issue with your site getting indexed properly, and you are on Magento, just try this test.

Go and do request a URL, do a header check, and find out from Google using their own Googlebots, or mimicking them, you can find out how Google is responding specifically. Or how your site is responding specifically to the different Googlebots. This is especially true if you’ve got this problem intermittently, if it just keeps coming in and going out and you don’t know why. Have a try of that. It’s worked for us, and we’re looking to see this client’s rankings, traffic and revenue skyrocket from fixing this problem. Hopefully that’s helpful. We’ll see you next week. Please go and sign the petition for .AU, and we’ll see you next week. Thanks very much. Bye.

Jim’s been here for a while, you know who he is.

« | »
Thank you! Your subscription has been confirmed. You'll hear from us soon.