Chinese Diceware

I’ve been trying to come up with strong passwords since forever, and have failed to find a magic alternative to entropy. Recently, I took Diceware for a roll, but wasn’t entirely happy with passwords like “wn rare swung strop situs slept”—wn isn’t even a word, is it? I also tried the Swedish dictionary but wasn’t much happier.

How about Mandarin Chinese, written using pinyin? There are only around 400 pinyin syllables, but thousands of characters with different meanings, so I guessed that for a random sequence of syllables it should often be possible to come up with a somewhat meaningful phrase.

The kHanyuPinlu property from the Unihan database turned out to be an excellent source for character to syllable mapping, so I wrote syllables.py to reverse that mapping. The output is a list of 392 pinyin syllables with example characters in traditional and simplified Chinese.

Unfortunately, 392 is not a power of 6, so using real dice to generate the numbers is a bit complicated, albeit possible. Instead I wrote roll.py, which uses as few bytes as possible from /dev/random to roll a die with an arbitrary number of sides.

Using my list and my virtual D-392, here are the first 6-syllable pinyin phrases I generated, each with a memorable (?) Chinese phrase and a rough English translation.

  • yan kai bo ren se dai—眼開撥任色帶—eyes open, poke any ribbon
  • zui ku ba ge mei xu—最酷八個沒序—the coolest eight have no order
  • ban zhai dian die keng bao—搬宅殿爹吭抱—moving villa/palace, dad says hold (this)
  • you mu sa kang xu su—有母萨扛需速—(things) carried by mother Bodhisattva need speed

A native speaker would probably be able to come up with better phrases, but I think that I could remember any of these, with 最酷八個沒序 being the easiest. If this is a representative sample, I think the scheme works.

How about the entropy? With 392 syllables, each syllable contributes log2(392) = 8.6 bits, so these 6-syllable phrases have 51.6 bits of entropy, slightly better than a completely random 8-character alphanumeric password. English Diceware has 12.9 bits of entropy per word, so to get as much entropy as with a 6-word English phrase, a 9-syllable Chinese phrase is needed. The average word and syllable length are 4.2 and 3.2 respectively, so the average phrase lengths (including spaces) would be 30.2 for English and 36.8 for Chinese. (Removing spaces blindly will lose some entropy if the pinyin becomes ambiguous.)

Feel free to use/improve my lists and scripts, and never forget: the coolest eight have no order!

2 thoughts on “Chinese Diceware

  1. I gave up on remembering passwords a long time ago, but I don’t trust password safes (they make me too paranoid). Instead, I just memorize a single strong master password and then use http://xanthir.com/password to generate individual site passwords through hashing. This gives me unique 26-char passwords for everything that’ll accept them (that is, everything except banking websites...).

    • I actually use a very similar scheme to generate per-site passwords, but for the master password I still don’t fully trust my brain to generate entropy, hence this little toy 🙂

      By the way, if you don’t access http://www.xanthir.com/password/ over HTTPS, make sure no one is injecting keylogging JavaScript into the page!

Comments are closed.