Search engine indexing

By default Micro.blog creates a robots.txt file for your site that allows Google and other search engines to index all the blog posts and pages on your site.

If you’d like to disable generating the robots.txt file, you can override it with a custom theme. Create a template named config.json in your theme with the following contents:

{
  "enableRobotsTXT": false
}

You can also add custom rules to the robots.txt file. Create a new template named layouts/robots.txt in a custom theme with contents such as:

User-agent: *
Disallow: /some-page-here

I submitted the robots.txt sitemap link for notes.jatan.space in Google Search Console but it shows an error? So I submitted the RSS for now and that worked. But how do I get Search Console to read the full site?

1 Like

I’d like to know why using site:khurt.blog/ to search my blog does not work.

It seems to work for me. I tried “site:khurt.blog books” in Google and got a bunch of results.

Hi Manton, can you try site:khurt.blog "My Best Images From 2021" or site:khurt.blog "My Best Images From 2018"

I get zero results when searching with Google. However, using the search feature of https://khurt.blog/search/ does return a result.

Also, it seems when I imported my WordPress posts (my self-hosted WP goes back to 2005 and has 6700 entries), some post made it and some did not. The post of course, does exist: Khürt Williams - My Best Images From 2018

“My Best Images From” post for 2019, 2017,2015, did not make it.

From what I can see, nothing hinders Google or other search engines from indexing that page. DuckDuckGo has no problem finding it, for example.

To help you figure out Google’s indexing problem, sign up for their official tool: Search Console.

I am wondering if there is some weird issue here. Yes, I see it via DuckDuckGo (which I don’t use) but not Google.

https://www.google.com/search?q=site%3Ahttps%3A%2F%2Fkhurt.blog+my+best+images&client=firefox-b-1-d&sxsrf=AOaemvL1BTlCQ1gPfPypbIcvwBp1c8z6_Q%3A1641497984197&ei=gEXXYZqpC8Pn_QamzqjgDw&ved=0ahUKEwia6fzB8J31AhXDc98KHSYnCvwQ4dUDCA0&oq=site%3Ahttps%3A%2F%2Fkhurt.blog+my+best+images&gs_lcp=Cgdnd3Mtd2l6EAxKBAhBGABKBAhGGABQAFgAYABoAHAAeACAAQCIAQCSAQCYAQA&sclient=gws-wiz

I think you have two unrelated problems. One is Google not indexing your page. That’s not too strange or uncommon, though. Google’s index contains many pages, but it does not include every page on the web.

It’s probably just a matter of time before their crawler finds the page and adds it to the index. If you want insight into what’s going on behind the scenes, Google Search Console is the tool for you.

The second problem is the “out of memory” one when your site is being rebuilt. @manton needs to take a look and resolve that for you.

@sod , I’ve been on micro.blog since launch. I was a kickstarter backer. It would be strange that DuckDuckGo indexed that domain before Google did.

I’ve used Google search console in the past but Processing data, please check again in a day or so is not the message I want to see.

I’ve moved the other issue to a separate thread.

Just to clarify: when I write page above, I mean page. Not site or blog or the entire domain. Clearly, Google is indexing your website. Not just every page. Yet.

From what we can observe from the outside, nothing prevents Google from indexing that page.

  • There’s no entry preventing Google in your robots.txt.
  • There’s no metadata on the page saying Google is not welcome.
  • As one would expect, the page is present in sitemap.xml.
  • The web page is publicly accessible and responds with a 200 OK status code.

I’m sure @manton will help you with the other issue. But the only way to figure out what’s going on with Google is by using their official tools.

Of course, all my observations above are the page’s current status. Maybe a technical problem prevented Google from indexing the page in the past. But if so, it has since been fixed. Permanently or temporary. :blush:

Thank you. It’s quite a mystery to me. Google also cleary had a sitemap for a while.

So, there are multiple things you can do from here to continue your troubleshooting journey. One helpful tool in the Search Console is URL Inspection. Feed it the URL in question: https://khurt.blog/2018/12/25/my-best-images.html.

Hopefully, that will give you some insights.

Solved: It didn’t occur to me that I need to submit the /sitemap.xml URL.