Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When I reload the page "https://journal.james-zhan.com/google-de-indexed-my-entire-b...", I get

Request URL: https://journal.james-zhan.com/google-de-indexed-my-entire-b...

Request Method: GET

Status Code: 304 Not Modified

So maybe it's the status code? Shouldn't that page return a 200 ok?

When I go to blog.james..., I first get a 301 moved permanently, and then journal.james... loads, but it returns a 304 not modified, even if i then reload the page.

Only when I fully sumbit the URL again in the URL-bar, it responds with a 200.

Maybe crawling also returns a 304, and Google won't index that?

Maybe prompt: "why would a 301 redirect lead to a 304 not modified instead of a 200 ok?", "would this 'break' Google's crawler?"

> When Google's crawler follows the 301 to the new URL and receives a 304, it gets no content body. The 304 response basically says "use what you cached"—but the crawler's cache might be empty or stale for that specific URL location, leaving Google with nothing to index.





You get a 304 because your browser tells the server what it has cached, and the server says "nothing changed, use that". In browsers you can bypass the cache by using Ctrl-F5, or in the developer tools you can usually disable caching while they're open. Doing so shows that the server is doing the right thing.

Your LLM prompt and response are worthless.


When Chrome serves a cached page, like when you click a on this page and then navitate back or hit F5, it shows it like this:

Request URL: https://news.ycombinator.com/item?id=46196076

Request Method: GET

Status Code: 200 OK (from disk cache)

I just thought that it would be worthwhile investigating in that direction.


That's a different situation. The browser decides what to do depending on the situation and what was communicated about caching. Sometimes it sends a request to the server along with information about what it already has. Then it can get back a 304. Other times it already knows the cached data is fine, so it doesn't send a request to the server in the first place. The developer tools show this as a cached 200.

Got it, thanks for explaining.

Has anyone noticed that the response for the blog page has a header: "x-robots-tag: noindex, nofollow"? What's the purpose of this header on a content page?

UPD: Sorry, never mind, I inspected a wrong response.


I don't see it. With Chrome devtools, for the posted URL I see X-Clacks-Overhead, X-Content-Type-Options, and X-Frame-Options. No X-Robots-Rag.

And no <meta name="robots"> in the HTML either.

What URL are you seeing that on? And what tool are you using to detect that?

Edit: cURL similarly shows no such header for me:

  curl -s -D - -o /dev/null https://journal.james-zhan.com/google-de-indexed-my-entire-bear-blog-and-i-dont-know-why/

Sorry. I am an idiot. Checked the wrong url. Please ignore.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: