md5, sha1, etc are very handy tools for programmers to generate hashed strings from random or fixed keys.
A lot of the times, we simply store these hashes directly into databases, setting the field data type as string, or use the hashes as filenames (handy for creating cache files) or as part of URL.
While there is nothing wrong in that, sometimes it bothers me that hashes, which are hexadecimal (base-16) are stored in memory/storage spaces that are of larger base, eg. string-typed field in ASCII encoding can store 28 different character types (essentially making it base-256), in UTF-8: base-(232), used as filenames in case-insensitive Non-POSIX Win32 NTFS: whatever base that is consisted of possible UTF-16 characters.
To improve on that, translate the hash, usually of hexadecimal, into numbers/strings with larger character sets. One of the easiest and handiest way to achieve this in PHP is through the use of base_convert function.
$nicerHash = base_convert($hash, 16, 36); // gets us '9ou4tb14lz40cokgw4ocoscs8' // from 'a3aca2964e72000eea4c56cb341002a4'
Unfortunately, as the largest-supported base of base_convert is only 36, this still does not give you the full range of supported character sets for most cases. It does, however, help in improving character-set utilization, shortening the resulting hash’s string length and at the same time, making the hash appears more human-friendly (at least to me with the fact my filenames are not all consists of only 0-f).
For better conversion of bases in PHP, you’ll have to write your own translation function. Do take note, however, that not all UTF-8 characters are printable and may result in corruption when transported through certain mediums, say as part of URL; for alpha characters, the letter case may also get lost in transition.
uzyn.com is a weblog by U-Zyn Chua —A web developer and cloud computing consultant of
2 Comments
Maybe someone can write a robust and secure function to translate the hashes to a ‘web safe’ character set.. That would be great for your idea using them as cache URLs..
Good idea. There are, however, a lot of fine-tunings that need to be mad available if it is to be general purpose enough, mixed case, charset, etc.