Ask HN: Why not tweak the Base32Hex alphabet a bit?

2 points by EGreg 2 days ago

The base32hex alphabet contains 0123456789ABCDEFGHIJKLMNOPQRSTUV

But one can confuse 0 with O, B with 8, 1 with I.

However we haven’t used WXYZ. Why not have an alphabet that omits B, I and O, and includes X, Y, Z?

Seems to me that such an alphabet would retain all the nice ASCII lexicographical ordering while at the same time be printable too.

jqpabc123 2 days ago

So your motivation here is improve "human" readability?

How many "humans" actually try to read Base32Hex?

If readability is a goal, maybe use or computer covert to a different format. Computers don't confuse 0 with O, B with 8 and 1 with I.

pwg 2 days ago

Guesses below:

1) by being "base32hex" the "hex" part resulted in inclusion of 0, 1, and 8.

2) whomever originally defined it simply did the most straightforward extension of appending "GHIJKLMNOPQRSTUV" without thought as to the fact that for some fonts, I O and B could be confused with the numbers with similar appearance.

The appending of the next sequential characters G through V also simplified the encoding/decoding algorithm [1] by only having one discontinuity (the gap between 9 and A) to contend with instead of four discontinuities (three of which are only one character position wide).

[1] the original algorithm was not likely implemented as a lookup table.

duped 2 days ago

The place where this really matters is if you're printing the string onto a physical device/surface (eg, the silk screen layer of a PCB) and fine details on some characters like 0/Q can be lost or things warped over time like D/8/B/6/G.

Base32 isn't supposed to be human readable and I don't think I've ever seen it printed so I don't think it matters. The point of Base32 is to encode binary in ascii.

runjake 2 days ago

I just use a proper programmer-friendly monospace font that emphasizes the distinctions of those characters (I.e. slashes or a dot in the zero char).

Then, I don't have to worry about undoing decades of momentum.

> Why not have an alphabet that omits B, I and O, and includes X, Y, Z?

Because that would bring integer increment errors and require a significant overhaul of existing code bases and require careful checking.