-
Notifications
You must be signed in to change notification settings - Fork 7
Home
The old Lipika engine was designed to fully materialize the cross-product of the rule and mapping tries. This made transliteration simple as it just had to walk this large trie. However, the cross-product trie consumed more memory for scripts such as Hindi and Gurmukhi which had a large number of rules. Given that Custom Keyboards in iOS have very limited memory quotas, this method resulted in too many crashes. The current engine is designed to burn more CPU in exchange for much lower memory footprint. This is accomplished by keeping the mapping and rule tries separate and walking them in tandem as inputs flow into the engine.
The goal is to keep memory usage well under 30MB
There are many problems that arise by keeping the mapping and rule trie separate. Most of these can be captured using the following example that we will use throughout the design.
Mapping:
Type | Key | Scheme | Script |
---|---|---|---|
CONSONANT | KA | k | क |
CONSONANT | TA | t | त |
CONSONANT | RRA | R | ऱ |
VOWEL | VOCALIC RR | RRI | ॠ |
DEPENDENT | VOCALIC RR | RRI | ॄ |
SIGN | VIRAMA | .h | ् |
Rule:
Input | Output |
---|---|
{CONSONANT} | [CONSONANT][SIGN/VIRAMA] |
{CONSONANT}{DEPENDENT/A} | [CONSONANT] |
{CONSONANT}{DEPENDENT} | [CONSONANT][DEPENDENT] |
Input String:
Now we consider the input string ktRRI
. We expected the following outputs as the inputs flow in:
Input | Output |
---|---|
k | क् |
kt | क्त् |
ktR | क्त्ऱ् |
ktRR | क्त्RR |
ktRRI | क्तॄ |
The algorithm depends on the following properties of the TrieWalker:
- TrieWalker greedily consumes inputs to produce output as deep in the trie as possible
- All intermediate outputs of a traversal from the root to leaf of a trie share the same
epoch
- TrieWalker bumps
epoch
when it hits a dead-end and replays part of the inputs from the root - Output from an ongoing epoch of the TrieWalker is considered
finalized
when the epoch ends
The crux of the design rests on this question: at what point can the output of the Engine be finalized - meaning that it won't change any further?
All output up to but excluding the finalized mapping before the last finalized mapping output, both of which also finalized the rules trie can be finalized. So in the example below, when the output at the red arrow is produced, everything above the dotted line can be finalized.
This means that we have to replay all non-finalized events every time they change. The implementation is mainly concerned with tracking all events and figuring out the finalization point.
In the example below, we type kaktRRI
. Mapping epoch 3 is the last finalized mapping output that also finalized the rules trie. Even though Mapping epoch 4 finalized the rules trie, it is itself not yet finalized.
-
k
adds क् - overall we have क् -
a
changes it to क - overall we have क -
k
adds क् - overall we have कक् -
t
adds त् - overall we have कक्त् and everything up to but excluding mapping epoch 2 is finalized -
R
adds ऱ् - overall we have कक्त्ऱ् and everything up to but excluding mapping epoch 3 is finalized -
R
changes it to RR - overall we have कक्त्RR because everything from mapping epoch 3 is replayed -
I
changes it to ॄ - overall we have कक्तृ because everything from mapping epoch 3 is replayed
Essentially, the engine has to produce the following output for the example above:
k: (input: "k", output: "क्", isPreviousFinal: false)
a: (input: "k", output: "क्", isPreviousFinal: false)
(input: "ka", output: "क", isPreviousFinal: false)
k: (input: "k", output: "क्", isPreviousFinal: false)
(input: "ka", output: "क", isPreviousFinal: false)
(input: "kak", output: "कक्", isPreviousFinal: false)
t: (input: "k", output: "क्", isPreviousFinal: false)
(input: "ka", output: "क", isPreviousFinal: false)
(input: "k", output: "क्", isPreviousFinal: true)
(input: "kt", output: "क्त्", isPreviousFinal: false)
R: (input: "k", output: "क्", isPreviousFinal: false)
(input: "kt", output: "त्", isPreviousFinal: true)
(input: "ktR", output: "त्ऱ्", isPreviousFinal: false)
R: (input: "kt", output: "त्", isPreviousFinal: false)
(input: "tRR", output: "त्RR", isPreviousFinal: false)
I: (input: "t", output: "त्", isPreviousFinal: false)
(input: "tRRI", output: "तृ", isPreviousFinal: false)
Ideally, one should be able to finalize output of a Rules epoch immediately after the epoch has passed. However, there are cases such as RRI in ITRANS which produce intermediate outputs that can spuriously bump the epoch of the Rules TrieWalker and hence need to be eventually backed out. Since only one output per Mapping epoch is executed on the Rules TrieWalker, it is clear that no matter how many intermediate outputs the Mapping TrieWalker produces, it can only spuriously bump the Rules TrieWalker epoch by one. Hence, if we finalize all output beyond the last but one Rules epoch and replay all non-finalized events on new input, then there will be no need to back out of spurious epochs.