We’re recently being hit with more and more bots.
Some of them are crawling our site and hitting valid or invalid endpoints. We’ve seen plenty of credential stuffing attacks as well. Most of them distributed across different IPs, with each IP hitting us at low frequency.
And most recently, someone abused our registration form to spam their recipients via our system.
It was quite clever actually. When you register, you enter your name, email and password. We then send a confirmation email saying something like
“Hey Roberta, thanks for joining. Please click here to confirm your account”.
Now those guys used their victim’s email address, and used the name field to link to a URL. So those users would get an email
“Hey lottery tickets http://some.link, thanks for joining. Please click here to confirm your account”.
Slimey. Naturally our own email system took the hit of sending spam. Double ouch.
Luckily, we had some anomaly detection in place, and we blocked those guys quickly. They used some browser automation from a fixed set of IPs, so it was easy to block. At least until the next wave…
I’ve been dealing with those types of scenarios with fail2ban, and it’s really quite effective. We define regular expressions to inspect our log files matching certain patterns, and then ban if we see repeated offensive behaviour. fail2ban is limited though in some aspects.
First of all, those rules are a bit of a pain to create and maintain, and you need to make sure the offending IP appears on the application log record you want to capture. In some cases it’s easy, but not always. The bigger problem however is that fail2ban doesn’t scale. The more servers you have — let’s say in a load-balanced setup — the less accurate fail2ban becomes. Or you need to aggregate all your logs on a single fail2ban host, creating a single point of failure or a bottleneck…
So I was searching for a better solution. Sadly there aren’t many. Cloudflare, which we also use, offers some degree of protection. But it’s not as flexible. And of course there’s reCAPTCHA. You know, those annoying things asking you to pick traffic signs, or even just click “I’m not a robot”?
Now I was initially hesitating to use it. I’m not sure why, but the fact that it doesn’t really have any real competition bothers me. Plus, as a user, I’m frequently annoyed by those challenges, and I hate this experience.
Luckily, the latest version of reCAPTCHA (v3) doesn’t present any user-facing challenges. It’s completely invisible. The no-competition problem is not something I can solve. I discovered that even Cloudflare itself uses reCAPTCHA in some cases! And these guys have their own Javascript challenge and what not… So I decided to bite the bullet, and give it a shot.
Setting it up is surprisingly simple, and from my limited experience, quite effective. That is, the scores it produced were surprisingly accurate. Albeit my ability to test different scenarios was limited.
I’ll try to give some pointers for implementing reCAPTCHA v3 with Rails 5.1 and Devise 4. The implementation can work on any form or controller however, and not just with Devise.