ℹ️ Blocking AI bots

Large language models in artificial intelligence are often trained with large amounts of data on the open web. By default your blog on Micro.blog is on the open web, so these AI companies may use your blog posts to train their models. If you’d prefer to block AI bots from accessing your blog posts, you can use the robots.txt convention. Well-behaving search engines and AI bots will consult the robots.txt file before crawling your blog.

Because Micro.blog blogs use Hugo for publishing, all the configuration that is possible in Hugo is also possible in Micro.blog. To set up a custom theme for this, see the help page about search engine indexing.

There are also multiple Micro.blog plug-ins that make configuration easy. For example:

The “No robots” plug-in also has a special side effect: it disables search engine indexing for your Micro.blog profile page, not just your blog. This is useful if you really want to limit discoverability of your presence on the web.

When using a custom robots.txt, the following user-agents are for known popular AI bots:

  • GPTBot
  • ChatGPT-User (only used when fetching web pages for the user, not training)
  • AppleBot-Extended
  • Google-Extended
  • PerplexityBot
  • ClaudeBot

We also recommend using Creative Commons tags on your blog to let visitors and bots know how your content is licensed. These don’t appear to be used by AI bots yet, but will be useful in the future to be more explicit about the copyright of your work, for humans and bots. There is also a Creative Commons plug-in for Micro.bog.