Many apps require some tasks to execute on schedule: cleaning up inactive user accounts, generating daily, weekly or monthly reports, sending out reminders via email, etc.
cron is a simple and trusted scheduler for unix, and used on pretty much any unix-based system I come across.
So cron seems like a natural candidate for triggering those job executions. But it’s not always the best solution.
In our case, we’ve used the whenever gem for rails successfully for a long while. The gem acts as a cron DSL and lets you inject and manage cron entries from your rails app.
The problem starts however when you start growing, and your app spans more than one server. Or even if you only use one server, but want to be able to fail-over, or switch from one server to another.
Why? suddenly you have more than one cron launcher, and jobs that should execute once end up executing once on each server. This can cause some weird and unexpected lockouts, duplication and other issues.
So what’s the alternative?
There are a few options: in-memory schedulers like rufus-scheduler, or clockwork; database-driven solutions like perfectsched; and things that live on top of async job launchers like sidekiq-cron, sidekiq-scheduler, or using the built-in cron in Sidekiq Enterprise (the latter would cost you $$, but would have tons of other benefits).
Somehow all of them felt like an overkill for our simple needs. We only had maybe half a dozen of those scheduled tasks, and didn’t really want another component to handle them. Especially if you consider the tricky part of making sure jobs execute only once, at a given time, across several hosts. We “just” wanted something that would send us a webhook on a particular schedule. We can take care of the rest (courtesy of redis and sidekiq).
At that point, I was thinking: “there must be a very simple SaaS that does this”. i.e. a 3rd party service that sends you a trigger on schedule via webhooks.
I did find some services like cronitor.io. But those would typically monitor your cron jobs, rather than trigger their execution for you. I even emailed the cronitor guy and asked if there’s something like this in the pipeline, but there wasn’t. There are actually a couple of 3rd party services, but they didn’t suit (see at the end of the post for more info about them, and why).
Then I bumped into an article about Lambda Cron. You can execute a lambda function using the Cloudwatch event scheduler. And the guys who wrote the post even measured how punctual and reliable those schedules are (conclusion: they are!).
So then, it was only a question of creating a very simple Lambda function that executes on schedule, and then trigger a webhook to our rails app. The rails app would execute a task whenever the webhook entry is reached.
But then there was a question of security. How do I make sure some random dude isn’t pinging my webhook endpoints and triggers billion job executions?
There are a few options there, including creating a static IP, but it felt like too much added complexity as well.
Being able to sign those webhooks with a simple HMAC and a shared key was much simpler and easier, and that’s what I ended up implementing.
Let me show you how.
Lambda function
The lambda function is relatively simple. I’m using python here with requests
import os import time import hashlib import hmac from botocore.vendored import requests import logging logger = logging.getLogger() logger.setLevel(logging.INFO) # the sign method takes a message and a secret key, and returns a # hex-based hmac signature (using sha256), as well as a time-based nonce def sign(message, secret_key): t = int(time.time()) return t, hmac.new( secret_key.encode("utf8"), "{}.{}".format(t, message).encode("utf8"), hashlib.sha256 ).hexdigest() # this is the main lambda handler, it gets triggered with an event # the event is set on Cloudwatch def lambda_handler(event, _context): logger.info(event) # url is constructed from: # base_url: e.g. "https://example.com/webhooks" # event["task"]: e.g. "generate-monthly-reports" url = "{}/{}".format(os.environ['base_url'], event["task"]) # secret_key is defined as an environment variable for Lambda # (there are even more secure ways, but it should be # encrypted automatically and probably good-enough for most cases) secret_key = os.environ['secret_key'] # our signature protects the whole url t, signature = sign(url, secret_key) logger.info("t={},signature={}".format(t, signature)) # POSTing a request to the url, and passing the signature and the nonce r = requests.post(url, {"signature": signature, "t": t}) logger.info( "response status: {} body: {} headers: {}".format( r.status_code, r.text, r.headers ) ) return r.status_code
The signature algorithm is essentially the same one used by Stripe. And therefore we can validate it in our Rails app using the Stripe library (it’s simple enough to extract it out though, if you’re not using Stripe, the algorithm is very simple, yet robust in terms of security).
After setting the Lambda function, you should provide it with two environment variables: base_url
and secret_key
(which should be long, random and complex)
Cloudwatch Events
The next step is to create Cloudwatch events that run on schedule, and launch your Lambda function.
You need to create a new Rule
for every scheduled event you have. The rule would be triggered by Event Source
which is a Schedule
(to which, you provide a cron expression, or tell it to run at a fixed rate).
The Target
of the rule, would be our Lambda function.
The important thing here is to pass the task
parameter to our Lambda. This parameter would define the actual task to execute on our app, or the last part of the url. In the example above this was set to generate-monthly-reports
. So your webhook URL would end something like https://example.com/webhooks/generate-monthly-reports
.
This allows us to re-use the same Lambda function for lots of different schedule events and tasks / URL endpoints.
You would need to configure this on the Target->Configure input->Constant (JSON text),
and set it as {"task": "generate-monthly-report"}
(for our example).
Once set, the Cloudwatch event would trigger your lambda on schedule, and pass it the right task parameter which would send the webhook to your app. Easy.
Handling webhooks in Rails
Now that we have our Lambda set up to fire a webhooks on schedule, we need to handle the webhook and launch a job.
We’ll use a simple webhooks rails controller to handle our tasks.
class WebhooksController < ApplicationController # webhooks don't have CSRF tokens skip_before_action :verify_authenticity_token # the authorize before action would verify the webhook signature before_action :authorize def generate_monthly_report # generate the report, e.g. MonthlyReportGenerator.perform_async end private def authorize head :forbidden unless valid_signature? end def valid_signature? return false if params[:signature].blank? || params[:t].blank? # we're (re)using the Stripe signature algorithm to keep things DRY # and to ensure our security was externally validated # see https://stripe.com/docs/webhooks/signatures header = "t=#{params[:t]},v1=#{params[:signature]}" # normally, you'd pull the secret from ENV or whichever way you manage secrets secret = ENV["LAMBDA_CRON_SECRET"] Stripe::Webhook::Signature.verify_header(request.url, header, secret) rescue Stripe::SignatureVerificationError => e logger.warn(e) return false end end
The controller authorizes the request by making sure the signature is valid, passing the signature and nonce together with our secret. As I mentioned before, I’m re-using the Stripe signature verification code, but it’s fairly trivial to implement it yourself or look up the Stripe implementation and inline it if necessary.
I left the routes and other small bits out, because they are fairly trivial.
Notes, trade-offs, limitations, etc
At this point, you might wonder whether a Gem or a plugin is actually much simpler, and whether AWS Lambda and Cloudwatch are not actually more complicated. In a sense, it’s easier to install a Gem and configure it. But I think it’s a better, more robust, and more generic solution (it can work with any web app, not just rails). I’d rather rely on AWS to execute things on time, than a thread that runs on my own system, or a plugin for Sidekiq.
In terms of cost, it’s virtually free, because the Lambda execution time is tiny. It might not be everybody’s cup of tea, but I’m happy with this solution and it’s been serving us well for a while now in production. Adding, removing or changing a cron job is pretty simple even with the AWS management interface.
A few things that this solution doesn’t do:
- Retries – if the webhook endpoint isn’t reachable or the server times out, the job won’t execute and Lambda won’t retry it. cron doesn’t retry either, so I’m ok with it.
- Automation – I defined the Lambda functions and cloudwatch events using the AWS management interface manually, rather than scripted it. It’s not ideal, but for what it does, this is also a trade-off I’m happy to make.
- Monitoring / Alerts – If you look at the Lambda function code, you’ll see that it’s logging things, and you can access those logs on AWS, but it won’t alert you if something goes wrong, and I’m not monitoring those logs in any specific way. It’s possible to do it, but I’m independently monitoring that those tasks are executed using Datadog (not covered on this blog post). You can use cronitor or a similar service though if you want.
- Parameters – other than the signature and nonce, no other parameters are passed to the webhook endpoints. It’s trivial to add params though if you need them. I had no need for it though, and also preferred to keep things simple.
Other alternatives?
I did come across one commercial service that essentially does what my Lambda/Cloudwatch combo does (with some extra bits), called crondog. Unfortunately it’s been in Beta for a rather long while now. Not sure if the service is active or maintained, and personally I would feel safer with my own Lambda on AWS.
There’s also cron support on hook.io, but somehow even with a simple name like hook.io and a supposedly simple value proposition (“hook.io is the leading open-source provider of microservice and webhook hosting.”), I wasn’t quite sure what to make of the entire service.
Lastly, Zapier offers some schedule support, but it didn’t seem as flexible. For example, it doesn’t look like it’s possible to trigger webhooks every 5 minutes (not sure why).