It’s always nice to be able to get some feedback, or for users to make a contact via a simple Contact form. However, it didn’t take too long before spammers started hitting those forms too. It was quite interesting to see the kind of messages we started receiving. In a way, most of those submissions were more like stories, or snippets from an email to a friend. They didn’t have any of those very much expected keywords for fake watches or erectile dysfunction enhancers. Many didn’t even have any links either. So what were these messages then? My personal guess was that these were some kind of a reconnaissance attempts. The bots were sending innocent messages first to various online forms. Then I imagine they will crawl the site more, trying to see if those submissions appear elsewhere. If/when they do, they will hit those forms hard with the real spam content. In any case, these were all speculations that I didn’t really care to prove right or wrong. I just wanted to get rid of this junk. Fast.
Category: Technology
A recent comment by Martyn on my cloud performance shoot-out post prompted me to do another round of testing. As the bootstrap process I described on the last post evolved, it’s always a good idea to test it anyway, so why not kill two birds with one stone? The comment suggested that the Amazon EC2 micro instance
is CPU throttled, and that after a long period (in computer terms: about 20 seconds according to the comment), you could lose up to 99% of CPU power. Whereas on a small instance
, this shouldn’t happen. So is EC2 small
going to perform way-better than the micro
instance? How is it going to perform against Linode or Rackspace equivalent VPS?
Webfaction fail. over.
This post starts as a rant about webfaction, but somehow turns into a rave. I recently discovered (the hard way) that I can failover almost any site to a secondary host in a different data centre, all with a few scripts on a webfaction shared hosting account.
fabric-graphite is a fabric script to install Graphite, Nginx, uwsgi and all dependencies on a debian-based host.
Why?
I was reading a few interesting posts about graphite. When I tried to install it however, I couldn’t find anything that really covered all the steps. Some covered it well for Apache, others covered Nginx, but had steps missing or assumed the reader knows about them etc.
I’m a big fan of fabric, and try to do all deployments and installations using it. This way I can re-run the process, and also better document what needs to be done. So instead of writing another guide, I created this fabric script.
One of my primary aims when building a resillient cloud architecture, is being able to spawn instances quickly. Many cloud providers give you tools to create images or snapshots of existing cloud instances and launch them. This is great, but not particularly portable. If I have one instance on Linode and I want to clone it to Rackspace, I can’t easily do that.
That’s one of the reasons I am creating bootstrap scripts that completely automate a server (re)build process. Given an IP address and root password, the script should connect to the instance, install all necessary packages, pull the code from the repository, initialize the database, configure the web server and get the server ready for restore of user-data.
I’m primarily using fabric for automating this process, and use a standard operating system across different cloud providers. This allows a fairly consistent deployments across different providers. This also means the architecture is not dependent on a single provider, which in my opinion gives a huge benefit. Not only can my architecture run on different data centres or geographic locations, but I can also be flxeible in the choice of hosting providers.
All that aside however, building and refining this bootstrapping process allowed me to run it across different cloud providers, namely: Rackspace, Linode, and EC2. Whilst running the bootrstrapping process many times, I thought it might be a great opportunity to compare performance of those providers side-by-side. My bootstrap process runs the same commands in order, and covers quite a variety of operations. This should give an interesting indication on how each of the cloud providers performs.
One of the best rules of thumb I know is the 80/20 rule. I can’t think of a more practical rule in almost any situation. Combined with the law of diminishing returns, it pretty much sums up how the universe works. One case-study that hopes to illustrate both of these, if only a little, is a short experiment in optimization I carried out recently. I was reading so many posts about optimizing wordpress using nginx, varnish, W3-Total-Cache and php-fpm. The results on some of them were staggering in terms of improvements, and I was inspired to try to come up with a similar setup that will really push the boundary of how fast I can serve a wordpress site.
Spoiler – Conclusion
So I know there isn’t such a thing as too much cash, but does the same apply to cache?
When talking about security, the first thing that usually comes to mind is encryption. Spies secretly coding (or de-coding) some secret message that should not be revealed to the enemy. Encryption is this mysterious thing that turns all text into a part of the matrix. Developers generally like encryption. It’s kinda cool. You pass stuff into a function, get some completely scrambled output. Nobody can tell what’s in there. You pass it back through another function – the text is clear again. Magic.
Encryption is cool. It is fundamental to doing lots of things on the Internet. How could you pay with your credit card on Amazon without encryption? How can you check your bank balance? How can MI5 pass their secret messages without Al-Qaida intercepting it?
But encryption is actually not as useful as people think. It is often used in the wrong place. It can easily give a false sense of security. Why? People forget that encryption, by itself, is usually not sufficient. You cannot read the encrypted data. But nothing stops you from changing it. In many cases, it is very easy to change encrypted data, without knowledge of the encryption key.
Scoring a goal against google is never easy. Google analytics allows you to do some strange and wonderful things, but not without some teeth grinding. I was struggling with this for a little while, and it was a great source of frustration, since there’s hardly any info out there about it. Or maybe there is lots of info, but no solution to this particular problem. I think I finally nailed it.
Dynamic Goal Conversion Values
I was trying to get some dynamic goal conversion values into Analytics. I ended up reading about Ecommerce tracking and it seemed like the way to go. Not only would I be able to pick the goal conversion value dynamically, it gives you a breakdown of each and every transaction. Very nice. After implementing it, I was quite impressed to see each transaction, product, sku etc appear neatly on the ecommerce reports. So far so good. But somehow, goals – which were set on the very same page as the ecommerce tracking code – failed to add the transaction value. The goals were tracked just fine, I could see them adding up, but not the goal value. grrrr…
I’ve come across a small nuisance that seemed to appear occasionally with unicode urls. Some websites seem to encode/escape/quote urls as soon as they see any symbol (particularly % sign). They appear to assume it needs to be encoded, and convert any such character to its URL-Encoded form. For example, percent (%) symbol will convert to %25, ampersand (&) to %26 and so on.
This is not normally a problem, unless the URL is already encoded. Since all unicode-based urls use this encoding, they are more prone to these errors. What happens then is that a URL that looks like this:
http://www.frau-vintage.com/2011/%E3%81%95%E3%81%8F%E3%82%89 …
will be encoded again to this:
http://www.frau-vintage.com/2011/%25E3%2581%2595%25E3%25 …
So clicking on such a double-encoded link will unfortunately lead to a 404 page (don’t try it with the links above, because the workaround was already applied there).
A workaround
This workaround is specific to wordpress 404.php, but can be applied quite easily in other frameworks like django, drupal, and maybe even using apache htaccess rule(?).
<?php /* detecting 'double-encoded' urls * if the request uri contain %25 (the urlncoded form of '%' symbol) * within the first few characeters, we try to decode the url and redirect */ $pos = strpos($_SERVER['REQUEST_URI'],'%25'); if ($pos!==false && $pos < 10) : header("Status: 301 Moved Permanently"); header("Location:" . urldecode($_SERVER['REQUEST_URI'])); else: get_header(); ?> <h2>Error 404 - Page Not Found</h2> <?php get_sidebar(); ?> <?php get_footer(); endif; ?>
This is placed only in the 404 page. It then grabs the request URI and checks if it contains the string ‘%25’ within the first 10 characters (you can modify the check to suit your needs). If it finds it, it redirects to a urldecoded version of the same page…
On my previous post I talked about django memory management, the little-known maxrequests parameter in particular, and how it can help ‘pop’ some balloons, i.e. kill and restart some django processes in order to release some memory. On this post I’m going to cover some of the things to do or avoid in order to keep memory usage low from within your code. In addition, I am going to show at least one method to monitor (and act automatically!) when memory usage shoots through the roof.