Technical difficulties

ITgraphic

I've been spending a ridiculous amount of time lately trying to police the traffic on this here website from robots. There's a new generation of crawler bots online, ones that ignore directives from robots.txt files and successfully masquerade as a human user, and they've become the majority of my traffic here, sucking up resources.

Granted, not a heck of a lot of resources, they're not interfering with any actual humans being able to access things here. But they're annoying. And, more importantly, I don't know what they're doing.

Best guess is that they're scrapers, looking for email addresses or other things in the text of websites that will facilitate marketing/spamming/nuisance assholery. Secondary guess is that they're bots sucking up text to use in building so-called AI large language models. Which is, at its core, copyright infringement.

Anyway, nothing has worked to block the bots. They get around everything. They avoid the bot blocks by spoofing a browser signature, so I block the version of the browser they pretend to use. That fails, because they're not really using it. I block the IP address range, but they just VPN their way to new ones.

It's pissing me off. But I'm also out of ideas, at least for the moment.

In the course of trying various block strategies, I broke the RSS feed. So for the less-than-one-percent of you that use the feed in Outlook or a browser RSS plugin, and for the few of you that rely on email updates (which are based on the RSS feed), you may have encountered some wonkiness over the past couple of days. Sorry about that. It's fixed now.

If I could just find a fix for the damn bots.

 

← Previous: Home is where the rain is (November 5, 2025)

|

Next: Dear Democrats (November 10, 2025) →

Comments

No one has commented on this page yet.

Post your comment

RSS feed for comments on this page | RSS feed for all comments

← Previous: Home is where the rain is / Next: Dear Democrats →