Search…

X3 Photo Gallery Support Forums

Search…
 
eskimo121
Experienced
Topic Author
Posts: 104
Joined: 01 Jun 2012, 21:22

https and robots.txt

24 Aug 2018, 18:37

Hello, everything is working great on my site except that until yesterday i had the http:// version of my site added on google Webmaster tools, and since my site is on https:// i deleted that old one and added the https://amazingpics.net 
However, I just got an email saying:
To owner of https://amazingpics.net/,
Search Console has identified that your site is affected by 1 new Index coverage related issue. This means that Index coverage may be negatively affected in Google Search results. We encourage you to fix this issue.

New issue found:
Indexed, though blocked by robots.txt


        What do you suggest? thanks.
Thanks.


Bored? Browse through some cool images: Amazing Pictures
 
User avatar
mjau-mjau
X3 Wizard
Posts: 13997
Joined: 30 Sep 2006, 03:37

Re: https and robots.txt

25 Aug 2018, 01:21

eskimo121 wrote: Search Console has identified that your site is affected by 1 new Index coverage related issue. This means that Index coverage may be negatively affected in Google Search results. We encourage you to fix this issue.

New issue found:
Indexed, though blocked by robots.txt


        What do you suggest? thanks.
Did you even click the link and follow the lead to see specifically what the issue was? I wish people would not read so much into what Google search console is reporting. I don't blame you though, as search console (formerly webmaster tools) is always reporting junk stuff.

It could be related to the fact that you now have a redirect from https to http. Since you MUST have both http and non-https versions of your website added to GSC, it may be reporting that your http website is inaccessible. OR it's reporting that it cannot access page that it's not supposed to access [see this post]. I am guessing it is the latter.

Should you be using GSC? Absolutely. It's a good way to detect "human" errors. Should I put weight into everything reported by GSC? Absolutely NOT. Click the "issue" they are referring to, and read it. Most likely, it's a link that is not supposed to exist or perhaps an incoming link from another website. Figure out what the report is first. Is the problem for any of your actual pages that you want to have indexed? Unlikely.

We have tons of this on our websites including www.photo.gallery. Quite some time ago, there were actually a few mistakes on our side that we rectified, but now it mostly posts junk. Let's take a look.

First page, "index issue". Uh, a bit unspecific?
Image

Following the lead, 0 errors. Where's the issue please?
Image

Next, warnings: "indexed, though blocked by robots.txt".
Image

Let's take a look at those two URL's. First www. photo.gallery/content/buy. That is not an URL, and our website correctly reports "404 not found". So, that is probably related to this post, or an incorrect link from somewhere else (even an external website). Why is Google even reporting this for pages that are not supposed to exist? Actually, Google doesn't know they are not supposed to exist. It thinks it's helping notify you about missing pages. Second URL www. photo.gallery/x3docs/control-panel/ ... We removed the outdated /x3docs/ section ages ago, and redirect instead to www.photo.gallery/docs. Nothing wrong in that. Could be our FORUMS has a link somewhere pointing to the old url. Not a problem of course, except that link will keep on being scraped by Google.

Conclusion
Unless you have a specific GSC error that actually affects your live and visible pages, ignore it. I will not be able to offer support on issues that are just "new index coverage issue". I wish they would not send emails with titles like that ... It sounds like there is something wrong. Of course with "index", they don't mean your index-page ... They mean search-indexing.
 
eskimo121
Experienced
Topic Author
Posts: 104
Joined: 01 Jun 2012, 21:22

Re: https and robots.txt

25 Aug 2018, 02:43

I see.
just for reference : They were referring to https://amazingpics.net/content/index/
Thanks.


Bored? Browse through some cool images: Amazing Pictures
 
User avatar
mjau-mjau
X3 Wizard
Posts: 13997
Joined: 30 Sep 2006, 03:37

Re: https and robots.txt

25 Aug 2018, 06:49

eskimo121 wrote:just for reference : They were referring to https://amazingpics.net/content/index/
... which is identical to your home page, no? In any case, since X3.25.0, we are by robots.txt DENYING access to all folders under /content/. As in the case above, to prevent Google from attempting to index that page, because it is not a page. I am still not 100% why Google is figuring out the link above, but I'm assuming it's guessing from image paths.

So ultimately, the link above will NOT get indexed by Google, but that is GOOD. So Google is basically telling you that it can't index a page that is not supposed to be indexed.

On a separate note, when trying to access folders inside content, server should return 404.
https:// demo.photo.gallery/content/index/
Image

I am not sure why your server is rewriting these paths so that the home page displays. In any case, it's harmless: Those URL's are not linked to by X3 or from anywhere, they are not supposed to exist, and they are blocked by robots.txt from being indexed by Google.
 
eskimo121
Experienced
Topic Author
Posts: 104
Joined: 01 Jun 2012, 21:22

Re: https and robots.txt

25 Aug 2018, 07:57

I recollect now redirecting all 404s to homepage because lot of links from the old X2  were still indexed/linked on other sites and kept getting hits while there was nothing there. So thats why its showing homepage instead of 404. thanks.
Thanks.


Bored? Browse through some cool images: Amazing Pictures