I have three blogs on Micro.blog and I want to indicate to search engines and AI that they should not index one of them. I do not want to block search/AI from the other two. Is there a plugin that will easily allow me to do that or do I have to use a custom theme?
I realize that all of this is on the honor system, and if search/AI wants to ignore robots.txt there’s nothing I can do about it. (Or is there?)
There are a few plug-ins in the directory if you search for robots.txt. The one I’ve built is called Custom Robots, and it gives you full control over the file’s content. There are many initiatives that maintain lists of AI-related crawlers, like ai.crawlers.txt.
If you want to discourage all crawlers, including traditional search engines, you can use this simple rule:
User-agent: *
Disallow: /
Or just install Manton’s plug-in, No Robots.
If you want to stop crawlers for real, you’ll need to block their IP addresses from connecting to your web server. That’s something Manton needs to do, or you’ll need to put a reverse proxy in front of your blog. It’s pretty technical, and you’ll need to constantly keep up with which IP addresses the AI crawlers are using, as they will jump around as they get blocked. It’s a cat-and-mouse game.
By the way, the blog I have now hidden from search engines and chatbots is my daily newsletter blog, following your suggestion nearly a year ago (and thank you for that). https://newsletter.mitchwagner.com Every post on that blog also appears on either mitchwagner.com or mitchellaneous.net and I don’t want to get the robots confused.