Aself-certifying identity is a string that can be managed manually whose association to a public key can be verified mathematically without requiring the verifier to consult or trust a third party. My suggested implementation is the hash of a public key encoded into a sequence of words using mnemonic encoding.
A similar concept is used by SFS - Self-certifying File System with self-certifying URLs. These URLs, however, are not encoded into words and are not very practical for manual handling.
Please tell me if you have a reference to a similar scheme, I am quite sure that I am not the first to think about this concept.
An arbitrary sequence will always be more difficult to memorize than a meaningful name and yet we deal with such arbitrary sequences everyday in the form of phone numbers. Most people cannot memorize phone numbers without some effort and rehearsing, but we can retain them in short-term memory long enough to use a number we look up on a phone book or write down a number we hear. Phone numbers can be considered an identity reference to the owner of the phone line.
A frequently quoted "fact" is that we can retain around 7 items in short term memory. The actual number varies from person to person and also depends on the type of items stored. The amount of entropy encoded by each of these items is determined by the size of the set from which they are chosen. For decimal digits this means 3.3 bits per digit, adding up to about 23 bits of data that can be stored in short term memory. For 6 items chosen from a vocabulary of 1626 words (10.67 bits per word) the capacity is 64 bits. Although it encodes more data such a sequence of words is not much more difficult to remember than a sequence of arbitrary numbers. It occupies about the same number of "working registers" in our brain but uses each "register" more effectively.
978-6213 chance-present-twist--jessica-tibet-stadium
The words are separated into groups of three for the same reason that phone numbers are hyphenated: it makes them easier to remember. It has been shown [Miller 1956] that the 7±2 item short term memory capacity may appear to be increased by grouping the data into manageable chunks. If the items in the chunk have something in common it makes it even easier to remember. For example, a familiar area code is easier to remember than 3 random digits. But even arbitrary chunks make it easier to remember the words. A few simple tests reveal that most people prefer chunks of three words.
One way to increase the capacity is to use a larger vocabulary. It is clear that this option faces rapidly diminishing returns: adding just one bit requires doubling the number of words. For example, using 16384 words would strain anyone's vocabulary, be unusable for most non-native speakers of English and provide only 31% more capacity than 1626 words.
A method to improve protection against brute-force attacks without increasing
the vocabulary size or number of words is to make the evaluation of the hash
function intentionally expensive. A legitimate user can afford to wait a few
hundreds of milliseconds for verification but an attacker will be slowed down
by a significant factor (e.g. by 2^17). This is similar to the techniques used
for strengthening password verification systems agains offline attacks
[Provos, Mazieres 1999]. Using an expensive hash function it should be
possible to get good security with just 64 bits or less. Five words should
provide a worthy challenge even for
distributed.net, equivalent to about 70 bits RC5 (53 bits + 17 bits
equivalent for expensive hash workload).
This method can be designed to be scalable as the speed of computers follows
Moore's law. This is important since the short term memory capacity of humans
does not grow at the same rate. The hash function workload factor can be
designed to grow with the key length or be encoded as a few bits into the
string itself.
Another type of collision is an unintentional collision which could happen as the number of users approaches the square root of the keyspace. For six words the chances are probably low enough unless self-certifying identities become extremely popular and are used by hudreds of millions of people. This issue can be addressed by always publishing an identity with more words than are normally used. This will allow users to choose their level of security and in the rare case of a collision the user will be notified and use one more word to make the identity unique again.
For interactive applications the key can simply be transmitted by one side and
verified by the other using the identity string that has been transmitted via
a separate, authenticated channel such as a phone call.
Application example - PGP
PGP or GnuPG
may be modified to allow mnemonic encoding for key IDs in addition to
hexadecimal. If you use a self-certifying identity instead of email addresses
key signatures may become unnecessary. Type the identity of the recipient,
your email software will look up the key on the keyserver, verify it and use
the email address attached to the key (and signed by it) to deliver the
message. There is no need to separately fetch the key and verify the
fingerprint over the phone, everything is done in one step.
There are certain problems in this proposal, though:
When you connect to the server it can verify your key, but it needs to get the key from somewhere first. It turns out that as part of the ssh1 protocol [Ylonen 1995] the client sends the modulus of his public key to the server. Only the public exponent is required to reconstruct the full public key, verify it using the identity string and continue the authentication normally. Since ssh key generator generates a relatively small number of different exponents this can be encoded into the identity string as 4-5 bits. A separate utility would be used for key generation or calculating the id string for an existing key. Note that this does not require any change to the client or the protocol. It is up to the server implementer to decide where to get public keys and how to verify them.
SSH v1 procotol (1995), draft-ylonen-ssh-protocol-00.txt (expired)
Niels Provos and David Mazières. (1999) A future-adaptable password scheme. http://www.usenix.org/events/usenix99/provos.html