I’ve come across a small nuisance that seemed to appear occasionally with unicode urls. Some websites seem to encode/escape/quote urls as soon as they see any symbol (particularly % sign). They appear to assume it needs to be encoded, and convert any such character to its URL-Encoded form. For example, percent (%) symbol will convert to %25, ampersand (&) to %26 and so on.
This is not normally a problem, unless the URL is already encoded. Since all unicode-based urls use this encoding, they are more prone to these errors. What happens then is that a URL that looks like this:
http://www.frau-vintage.com/2011/%E3%81%95%E3%81%8F%E3%82%89 …
will be encoded again to this:
http://www.frau-vintage.com/2011/%25E3%2581%2595%25E3%25 …
So clicking on such a double-encoded link will unfortunately lead to a 404 page (don’t try it with the links above, because the workaround was already applied there).
A workaround
This workaround is specific to wordpress 404.php, but can be applied quite easily in other frameworks like django, drupal, and maybe even using apache htaccess rule(?).
<?php /* detecting 'double-encoded' urls * if the request uri contain %25 (the urlncoded form of '%' symbol) * within the first few characeters, we try to decode the url and redirect */ $pos = strpos($_SERVER['REQUEST_URI'],'%25'); if ($pos!==false && $pos < 10) : header("Status: 301 Moved Permanently"); header("Location:" . urldecode($_SERVER['REQUEST_URI'])); else: get_header(); ?> <h2>Error 404 - Page Not Found</h2> <?php get_sidebar(); ?> <?php get_footer(); endif; ?>
This is placed only in the 404 page. It then grabs the request URI and checks if it contains the string ‘%25’ within the first 10 characters (you can modify the check to suit your needs). If it finds it, it redirects to a urldecoded version of the same page…
2 replies on “unicode url double-encoding 404 redirect trick”
Unfortunately didn’t work for me, still get 404…
Changed 10 to 100 and i works, nice, thank you!