-
Notifications
You must be signed in to change notification settings - Fork 390
Mnemonics Unraveled
Libbitcoin support for mnemonic wallet seed encoding began with Electrum v1. Later came BIP39, driven by the fine folks behind Trezor, which we believed Electrum was adopting. When the fine folks behind Electrum decided against BIP39, we found ourselves with three implementations. We had dropped Electrum v1 in the expectation that BIP39 would become sufficient. Later we added Electrum but found it necessary to also restore Electrum v1. It is not possible to properly implement Electrum mnemonic support without also implementing Electrum v1 and BIP39 mnemonics.
An overhaul of our mnemonic implementations was well overdue. What was anticipated to require one week required over a month of full time work. Test coverage is nearly complete and it will be merged soon. Before I forget the various lessons learned, I decided to write them down here. The information is all out there, somewhere. But ultimately it required digging through a lot of Python and C code. Wallet seeds are not something for a developer to take lightly, and code is always authoritative. Eventually I found myself sifting through Python internals, a deeper rabbit hole than I expected.
I will state for the record that I truly appreciate both Electrum and Trezor. Otherwise I would not have spent the time to provide comprehensive support for all three of these encodings. These observations are provided for my own record and to possibly aid others who may at some point find themselves in that same rabbit hole. When one goes this deep into implementation, interesting discoveries abound.
A universally-unique natural language.
Libbitcoin refers to a languages by the IANA subtag standard.
In linguistics a token
is an "individual occurrence of a linguistic unit in speech or writing".
Electrum allows seed generation from tokens (i.e. non-dictionary words).
A dictionary
is a standard set of reference tokens of a single language.
There may be more than one dictionary per language.
An interpreter
is a set of dictionaries of distinct languages, each identified by language.
An interpreter maps between entropy and mnemonic forms, given a specified or detected language.
There is no necessary standard for the set of dictionaries of an interpreter.
A word
is a dictionary token.
A mnemonic
is an ordered set of words from a single dictionary, conforming to standard size and checksum constraints.
Electrum v1 does not implement checksum constraints.
A mnemonic may be referred to as
recovery seed
by some implementations.
A sentence
is a mnemonic serialized as a sinistrodextral string of its words with whitespace delimiters.
Even the seemingly-trivial concept of whitespace is a potential implementation pitfall.
An encoding
is a standard bidirectional map between any mnemonic and its numeric representation.
Its entropy
is the numeric representation of a mnemonic.
Both a mnemonic and its entropy represent the same entropic value.
A passphrase
is arbitrary text that may be combined with a mnemonic in the formation of a seed.
Electrum v1 does not implement a passphrase.
A seed
is a secret number, derived using a standard one-way hash from a mnemonic.
A master private key
is a secret number, derived from a seed in a standard manner, allowing spending.
Electrum and typical BIP39 wallets encode this in accordance with BIP32.
Electrum v1 represents this as a base16-encoded 32 byte number (elliptic curve private key).
A master public key
is a non-secret number, derived in a standard one-way manner from the master private key, allowing receiving.
Electrum and typical BIP39 wallets encode this in accordance with BIP32.
Electrum v1 represents this as base16-encoded 64 byte number (uncompressed elliptic curve public key without a sign prefix).
A standard
is a set of defined
The reliance of Electrum and BIP39 on Unicode word and passphrase normalization is an inherent risk. Unicode implementations are large and complex. Trivial conversions in ASCII, such as lower-casing, become mind-boggling in Unicode.
"When two applications share Unicode data, but normalize them differently, errors and data loss can result. In one specific instance, OS X normalized Unicode filenames sent from the Samba file and printer sharing software. Samba did not recognize the altered filenames as equivalent to the original, leading to data loss. Resolving such an issue is non-trivial, as normalization is not losslessly invertible."
Implementations must rely and sprawling external dependencies, and those in turn depend on an evolving standard. Changes to the Unicode "database" of code points and mappings can and do happen, which can lead to loss of a wallet.
For this reason we have implemented Libbitcoin mnemonics without a hard dependency on Unicode. The Electrum v1, Electrum, and BIP39 classes do not require Unicode unless a non-ASCII passphrase is provided. If the library is compiled with WITH_ICU undefined all features remain available with the exception that seed passphrases are ASCII-limited.
For the same reason Libbitcoin does not support Electrum token-based seeding. All words must correspond to a dictionary. When WITH_ICU is defined, words are Unicode normalized before comparison, to improve the chance of matching. Ideally an implementation provides a dictionary-based word selector, making this unnecessary. If WITH_ICU is undefined then word normalizations are ASCII-limited, though pre-normalized non-ASCII words will match the dictionary.
A mnemonic sentence must be parsed into a list of words for dictionary matching and seed generation. Similarly a mnemonic is often emitted in sentence form for portability.
Users | Developers | License | Copyright © 2011-2024 libbitcoin developers
- Home
- manifesto
- libbitcoin.info
- Libbitcoin Institute
- Freenode (IRC)
- Mailing List
- Slack Channel
- Build Libbitcoin
- Comprehensive Overview
- Developer Documentation
- Tutorials (aaronjaramillo)
- Bitcoin Unraveled
-
Cryptoeconomics
- Foreword by Amir Taaki
- Value Proposition
- Axiom of Resistance
- Money Taxonomy
- Pure Bank
- Production and Consumption
- Labor and Leisure
- Custodial Risk Principle
- Dedicated Cost Principle
- Depreciation Principle
- Expression Principle
- Inflation Principle
- Other Means Principle
- Patent Resistance Principle
- Risk Sharing Principle
- Reservation Principle
- Scalability Principle
- Subjective Inflation Principle
- Consolidation Principle
- Fragmentation Principle
- Permissionless Principle
- Public Data Principle
- Social Network Principle
- State Banking Principle
- Substitution Principle
- Cryptodynamic Principles
- Censorship Resistance Property
- Consensus Property
- Stability Property
- Utility Threshold Property
- Zero Sum Property
- Threat Level Paradox
- Miner Business Model
- Qualitative Security Model
- Proximity Premium Flaw
- Variance Discount Flaw
- Centralization Risk
- Pooling Pressure Risk
- ASIC Monopoly Fallacy
- Auditability Fallacy
- Balance of Power Fallacy
- Blockchain Fallacy
- Byproduct Mining Fallacy
- Causation Fallacy
- Cockroach Fallacy
- Credit Expansion Fallacy
- Debt Loop Fallacy
- Decoupled Mining Fallacy
- Dumping Fallacy
- Empty Block Fallacy
- Energy Exhaustion Fallacy
- Energy Store Fallacy
- Energy Waste Fallacy
- Fee Recovery Fallacy
- Genetic Purity Fallacy
- Full Reserve Fallacy
- Halving Fallacy
- Hoarding Fallacy
- Hybrid Mining Fallacy
- Ideal Money Fallacy
- Impotent Mining Fallacy
- Inflation Fallacy
- Inflationary Quality Fallacy
- Jurisdictional Arbitrage Fallacy
- Lunar Fallacy
- Network Effect Fallacy
- Prisoner's Dilemma Fallacy
- Private Key Fallacy
- Proof of Cost Fallacy
- Proof of Memory Façade
- Proof of Stake Fallacy
- Proof of Work Fallacy
- Regression Fallacy
- Relay Fallacy
- Replay Protection Fallacy
- Reserve Currency Fallacy
- Risk Free Return Fallacy
- Scarcity Fallacy
- Selfish Mining Fallacy
- Side Fee Fallacy
- Split Credit Expansion Fallacy
- Stock to Flow Fallacy
- Thin Air Fallacy
- Time Preference Fallacy
- Unlendable Money Fallacy
- Fedcoin Objectives
- Hearn Error
- Collectible Tautology
- Price Estimation
- Savings Relation
- Speculative Consumption
- Spam Misnomer
- Efficiency Paradox
- Split Speculator Dilemma
- Bitcoin Labels
- Brand Arrogation
- Reserve Definition
- Maximalism Definition
- Shitcoin Definition
- Glossary
- Console Applications
- Development Libraries
- Maintainer Information
- Miscellaneous Articles