Long story short, my VPS, which I’m forwarding my servers through Tailscale to, got hammered by thousands of requests per minute from Anthropic’s Claude AI. All of which being from different AWS IPs.

The VPS has a 1TB monthly cap, but it’s still kinda shitty to have huge spikes like the 13GB in just a couple of minutes today.

How do you deal with something like this?
I’m only really running a caddy reverse proxy on the VPS which forwards my home server’s services through Tailscale. "

I’d really like to avoid solutions like Cloudflare, since they f over CGNAT users very frequently and all that. Don’t think a WAF would help with this at all(?), but rate limiting on the reverse proxy might work.

(VPS has fail2ban and I’m using /etc/hosts.deny for manual blocking. There’s a WIP website on my root domain with robots.txt that should be denying AWS bots as well…)

I’m still learning and would really appreciate any suggestions.

  • zoey@lemmy.librebun.comOP
    link
    fedilink
    English
    arrow-up
    3
    ·
    19 days ago

    Not gonna lie, the $3900/mo at the top of the /pricing page is pretty wild.
    Searched “crowdsec docker” and they have docs and all that. Thank you very much, I’ve heard of crowdsec before, but never paid much attention, absolutely will check this out!

    • K3CAN@lemmy.radio
      link
      fedilink
      English
      arrow-up
      3
      ·
      18 days ago

      The paid plans get you the “premium” blocklists, which includes one specially made to prevent AI scrapers, but a free account will still get you the actual software, the community blocklist, plus up to three "basic"lists.

      • CronyAkatsuki@lemmy.cronyakatsuki.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        18 days ago

        And the comminity blocklists are updated when more than a couple ( I think the number is something like 10-50 ) instances of crowdsec block an ip in some fast timeframe.

        The ai blocklist just adds IP when even one instance finds an AI trying to scrape right from the useragent.

        So even if the community blocklist has fewer ai ip’s, it does eventually include them.

          • CronyAkatsuki@lemmy.cronyakatsuki.xyz
            link
            fedilink
            English
            arrow-up
            1
            ·
            17 days ago

            I’m using the default list alongside Firehol BotScout list and Firehol cybercrime tracker list set to ban.

            Also using the Firehol cruzit.com list set to do captcha, just in case it’s not actually a bot.

            I’m also using the cs-firewall-bouncer and a custom bouncer that’s shown on crowdsecs tutorials to detect privilege escalation for if anybody actually manages to get inside.

            Alongside that I’m using a lot of scenario collection’s for specific software I’m using like nextcloud, grafana, ssh, … which helps a lot with attacks directly done on a service and not just general scraping or both path traversing.

            All free and have been using it for a year, only complaint I have is that I had to make a cronjob to restart the crowdsec service every day because it would stop working after a couple days because of the amount of requests it has to process.