Hash as filename

md5, sha1, etc are very handy tools for programmers to generate hashed strings from random or fixed keys.

A lot of the times, we simply store these hashes directly into databases, setting the field data type as string, or use the hashes as filenames (handy for creating cache files) or as part of URL.

While there is nothing wrong in that, sometimes it bothers me that hashes, which are hexadecimal (base-16) are stored in memory/storage spaces that are of larger base, eg. string-typed field in ASCII encoding can store 28 different character types (essentially making it base-256), in UTF-8: base-(232), used as filenames in case-insensitive Non-POSIX Win32 NTFS: whatever base that is consisted of possible UTF-16 characters.

To improve on that, translate the hash, usually of hexadecimal, into numbers/strings with larger character sets. One of the easiest and handiest way to achieve this in PHP is through the use of base_convert function.

$nicerHash = base_convert($hash, 16, 36);

// gets us '9ou4tb14lz40cokgw4ocoscs8'
// from 'a3aca2964e72000eea4c56cb341002a4'

Unfortunately, as the largest-supported base of base_convert is only 36, this still does not give you the full range of supported character sets for most cases. It does, however, help in improving character-set utilization, shortening the resulting hash’s string length and at the same time, making the hash appears more human-friendly (at least to me with the fact my filenames are not all consists of only 0-f).

For better conversion of bases in PHP, you’ll have to write your own translation function. Do take note, however, that not all UTF-8 characters are printable and may result in corruption when transported through certain mediums, say as part of URL; for alpha characters, the letter case may also get lost in transition.

This entry was posted in Tips and tagged , , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

2 Comments

  1. NTT
    Posted September 13, 2011 at 17:46 | Permalink

    Maybe someone can write a robust and secure function to translate the hashes to a ‘web safe’ character set.. That would be great for your idea using them as cache URLs..

    • uzyn
      Posted September 14, 2011 at 11:33 | Permalink

      Good idea. There are, however, a lot of fine-tunings that need to be mad available if it is to be general purpose enough, mixed case, charset, etc.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Additional comments powered byBackType