Rails Russian Doll Caching is super cool. It’s simple, effective and makes caching much easier to reason about.
There’s a dark side to it though. Not in the negative, evil sense. But rather the hidden, unknown, confusing sense.
Rails Russian Doll Caching is super cool. It’s simple, effective and makes caching much easier to reason about.
There’s a dark side to it though. Not in the negative, evil sense. But rather the hidden, unknown, confusing sense.
On my previous post, I described the architecture of Gimel – an A/B testing backend using AWS Lambda and redis HyperLogLog. One of the commenters suggested looking into Google BigQuery as a potential alternative backend.
It looked quite promising, with the potential of increasing result accuracy even further. HyperLogLog is pretty awesome, but trades space for accuracy. Google BigQuery offers a very affordable analytics data storage with an SQL query interface.
There was one more thing I wanted to look into and could also improve the redis backend – batching writes. The current gimel architecture writes every event directly to redis. Whilst redis itself is fast and offers low latency, the AWS Lambda architecture means we might have lots of active simultaneous connections to redis. As another commenter noted, this can become a bottleneck, particularly on lower-end redis hosting plans. In addition, any other backend that does not offer low-latency writes could benefit from batching. Even before trying out BigQuery, I knew I’d be looking at much higher latency and needed to queue and batch writes.
(updated: 2016-05-07)
Before I dive into the reasons for writing Gimel in the first place, I’d like to cover what it’s based on. Clearly, 100 lines of code won’t get you that far on their own. There are two (or three) essential components this backend is running on, which makes it scalable and also light-weight in terms of actual code:
“Russian doll Caching” gained some popularity recently, I suspect in part due to its catchy (or cachie?) name and how easy it is to visualize the concept. Rails 4 should have this improved caching available by default. With Rails 3 you need to install the cache-digests gem. It’s pretty easy to get started with it, and the documentation is clear. It makes a lot of sense to start using it in your Rails app. I won’t attempt to cover the basics and will assume you are already familiar with it. I want to talk about a specific aspect of fragment caching surrounding the generation of the cache keys.
A recent comment by Martyn on my cloud performance shoot-out post prompted me to do another round of testing. As the bootstrap process I described on the last post evolved, it’s always a good idea to test it anyway, so why not kill two birds with one stone? The comment suggested that the Amazon EC2 micro instance
is CPU throttled, and that after a long period (in computer terms: about 20 seconds according to the comment), you could lose up to 99% of CPU power. Whereas on a small instance
, this shouldn’t happen. So is EC2 small
going to perform way-better than the micro
instance? How is it going to perform against Linode or Rackspace equivalent VPS?
One of my primary aims when building a resillient cloud architecture, is being able to spawn instances quickly. Many cloud providers give you tools to create images or snapshots of existing cloud instances and launch them. This is great, but not particularly portable. If I have one instance on Linode and I want to clone it to Rackspace, I can’t easily do that.
That’s one of the reasons I am creating bootstrap scripts that completely automate a server (re)build process. Given an IP address and root password, the script should connect to the instance, install all necessary packages, pull the code from the repository, initialize the database, configure the web server and get the server ready for restore of user-data.
I’m primarily using fabric for automating this process, and use a standard operating system across different cloud providers. This allows a fairly consistent deployments across different providers. This also means the architecture is not dependent on a single provider, which in my opinion gives a huge benefit. Not only can my architecture run on different data centres or geographic locations, but I can also be flxeible in the choice of hosting providers.
All that aside however, building and refining this bootstrapping process allowed me to run it across different cloud providers, namely: Rackspace, Linode, and EC2. Whilst running the bootrstrapping process many times, I thought it might be a great opportunity to compare performance of those providers side-by-side. My bootstrap process runs the same commands in order, and covers quite a variety of operations. This should give an interesting indication on how each of the cloud providers performs.