Might Qwerty be optimal on touchscreens?
July 2012
It’s a common misconception that the Qwerty keyboard is designed to slow users down to prevent typewriters jamming. It fact, it’s designed to keep commonly consecutive letter pairs apart, so that two adjacent levers won’t collide.
(A more fun, but irrelevant, Qwerty story is that it is also designed such that the word ‘typewriter’ is all on the top row, to make demonstrating it easy. This story, if true, is itself fun but sucks all the fun out of the fact that the longest word that can be typed on the top row of a typewriter is ‘typewriter’. One of these is a fun fact, but I’ve no idea which.)
Nowadays, obviously, there are no swinging arms to collide, so we want the commonly-used keys to be reachable, and if possible to alternate hands as much as possible. Dvorak and Coleman have each had a stab at designing a better layout, but both aimed at the computer keyboard.
But increasingly, I type on my phone, using one very mobile thumb. I can get to any point on the screen, more-or-less right away – but sometimes I miss, and usually the phone figures out what I meant and autocorrects it. So maybe the most important thing about any given keyboard layout is how likely it is that a typo will result in a real word that the phone isn’t to know isn’t what I meant.
I wondered if suddenly Qwerty might be optimal again – separating pairs of letters that can be swapped to make another real word and that appear next to each other in English words aren’t totally different goals. So I thought I’d investigate.
So first I loaded the CSW12 Scrabble word list, and worked out a big table of how many places in the list you can replace each letter with each other letter to create a new word.
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
A | 176 | 681 | 305 | 5253 | 182 | 216 | 252 | 4180 | 15 | 208 | 477 | 166 | 401 | 4717 | 297 | 5 | 483 | 953 | 453 | 2898 | 52 | 201 | 54 | 585 | 30 | |
B | 176 | 1240 | 1238 | 284 | 1157 | 1123 | 775 | 91 | 360 | 406 | 1003 | 1383 | 761 | 157 | 1579 | 14 | 1407 | 1127 | 1344 | 74 | 398 | 836 | 70 | 366 | 151 | |
C | 681 | 1240 | 1109 | 375 | 876 | 1218 | 843 | 162 | 261 | 1102 | 1004 | 1036 | 1379 | 265 | 1418 | 24 | 1117 | 1832 | 1783 | 120 | 418 | 816 | 153 | 274 | 209 | |
D | 305 | 1238 | 1109 | 715 | 776 | 1445 | 709 | 205 | 299 | 942 | 1600 | 1467 | 1838 | 227 | 1242 | 18 | 7549 | 10979 | 2348 | 107 | 549 | 726 | 135 | 616 | 307 | |
E | 5253 | 284 | 375 | 715 | 186 | 553 | 454 | 4712 | 22 | 459 | 938 | 1010 | 683 | 3434 | 470 | 4 | 956 | 2123 | 1734 | 1893 | 87 | 291 | 57 | 2162 | 57 | |
F | 182 | 1157 | 876 | 776 | 186 | 723 | 592 | 76 | 261 | 357 | 846 | 829 | 632 | 126 | 1040 | 11 | 782 | 1037 | 1121 | 57 | 387 | 620 | 30 | 203 | 89 | |
G | 216 | 1123 | 1218 | 1445 | 553 | 723 | 556 | 186 | 333 | 747 | 812 | 810 | 1013 | 191 | 1018 | 39 | 914 | 1194 | 1514 | 103 | 373 | 669 | 130 | 371 | 180 | |
H | 252 | 775 | 843 | 709 | 454 | 592 | 556 | 137 | 261 | 641 | 1186 | 979 | 635 | 260 | 1098 | 6 | 1079 | 1215 | 1453 | 81 | 239 | 805 | 34 | 356 | 137 | |
I | 4180 | 91 | 162 | 205 | 4712 | 76 | 186 | 137 | 35 | 129 | 519 | 130 | 380 | 2786 | 154 | 3 | 387 | 525 | 331 | 2671 | 42 | 200 | 23 | 1118 | 28 | |
J | 15 | 360 | 261 | 299 | 22 | 261 | 333 | 261 | 35 | 117 | 311 | 301 | 224 | 27 | 331 | 2 | 344 | 320 | 346 | 1 | 125 | 196 | 7 | 136 | 66 | |
K | 208 | 406 | 1102 | 942 | 459 | 357 | 747 | 641 | 129 | 117 | 947 | 779 | 892 | 145 | 911 | 50 | 816 | 929 | 1340 | 71 | 379 | 517 | 119 | 273 | 180 | |
L | 477 | 1003 | 1004 | 1600 | 938 | 846 | 812 | 1186 | 519 | 311 | 947 | 1420 | 1938 | 479 | 1363 | 15 | 3271 | 1876 | 2049 | 286 | 599 | 899 | 142 | 394 | 248 | |
M | 166 | 1383 | 1036 | 1467 | 1010 | 829 | 810 | 979 | 130 | 301 | 779 | 1420 | 1085 | 238 | 1898 | 14 | 1321 | 1225 | 2912 | 119 | 575 | 747 | 130 | 380 | 250 | |
N | 401 | 761 | 1379 | 1838 | 683 | 632 | 1013 | 635 | 380 | 224 | 892 | 1938 | 1085 | 286 | 1367 | 7 | 2439 | 1925 | 2091 | 301 | 529 | 711 | 259 | 440 | 250 | |
O | 4717 | 157 | 265 | 227 | 3434 | 126 | 191 | 260 | 2786 | 27 | 145 | 479 | 238 | 286 | 296 | 6 | 574 | 531 | 389 | 2291 | 58 | 314 | 36 | 596 | 27 | |
P | 297 | 1579 | 1418 | 1242 | 470 | 1040 | 1018 | 1098 | 154 | 331 | 911 | 1363 | 1898 | 1367 | 296 | 11 | 1255 | 1528 | 2061 | 166 | 580 | 1011 | 141 | 377 | 243 | |
Q | 5 | 14 | 24 | 18 | 4 | 11 | 39 | 6 | 3 | 2 | 50 | 15 | 14 | 7 | 6 | 11 | 0 | 12 | 30 | 16 | 5 | 10 | 2 | 2 | 2 | |
R | 483 | 1407 | 1117 | 7549 | 956 | 782 | 914 | 1079 | 387 | 344 | 816 | 3271 | 1321 | 2439 | 574 | 1255 | 12 | 4806 | 2173 | 447 | 591 | 885 | 205 | 613 | 250 | |
S | 953 | 1127 | 1832 | 10979 | 2123 | 1037 | 1194 | 1215 | 525 | 320 | 929 | 1876 | 1225 | 1925 | 531 | 1528 | 30 | 4806 | 3126 | 327 | 617 | 887 | 232 | 2621 | 6540 | |
T | 453 | 1344 | 1783 | 2348 | 1734 | 1121 | 1514 | 1453 | 331 | 346 | 1340 | 2049 | 2912 | 2091 | 389 | 2061 | 16 | 2173 | 3126 | 256 | 682 | 1187 | 215 | 602 | 429 | |
U | 2898 | 74 | 120 | 107 | 1893 | 57 | 103 | 81 | 2671 | 1 | 71 | 286 | 119 | 301 | 2291 | 166 | 0 | 447 | 327 | 256 | 43 | 416 | 15 | 239 | 12 | |
V | 52 | 398 | 418 | 549 | 87 | 387 | 373 | 239 | 42 | 125 | 379 | 599 | 575 | 529 | 58 | 580 | 5 | 591 | 617 | 682 | 43 | 353 | 96 | 142 | 154 | |
W | 201 | 836 | 816 | 726 | 291 | 620 | 669 | 805 | 200 | 196 | 517 | 899 | 747 | 711 | 314 | 1011 | 10 | 885 | 887 | 1187 | 416 | 353 | 108 | 400 | 136 | |
X | 54 | 70 | 153 | 135 | 57 | 30 | 130 | 34 | 23 | 7 | 119 | 142 | 130 | 259 | 36 | 141 | 2 | 205 | 232 | 215 | 15 | 96 | 108 | 74 | 58 | |
Y | 585 | 366 | 274 | 616 | 2162 | 203 | 371 | 356 | 1118 | 136 | 273 | 394 | 380 | 440 | 596 | 377 | 2 | 613 | 2621 | 602 | 239 | 142 | 400 | 74 | 108 | |
Z | 30 | 151 | 209 | 307 | 57 | 89 | 180 | 137 | 28 | 66 | 180 | 248 | 250 | 250 | 27 | 243 | 2 | 250 | 6540 | 429 | 12 | 154 | 136 | 58 | 108 |
As you can see, the letters involved in typos that are genuine words are also the most common letters – except C and P. (The frequency values are on an arbitrary scale to match the typo figures.)
Then I wrote a Python routine to generate a ‘badness’ score for each layout, which is the total number of words you can make by replacing a letter of another word with one of the six keys adjacent to it. Running it on 10,000 random layouts, the average badness is around 83,603, with a standard deviation of 14,024.
Here are some other layouts I tried:
Layout | Badness | STDs above mean |
---|---|---|
Qwerty | 119,170 | 2.54 |
Dvorak | 121,458 | 2.70 |
Colemak | 112,354 | 2.05 |
Best random | 46,414 | −2.65 |
Worst random | 151,438 | 4.84 |
Alphabetic | 74,064 | −0.68 |
Best I found | 31,992 | −3.68 |
(Predictable answer to question in title: “haha, no”.) Alphabetic uses the same key layout as Qwerty: 10 on the top row, 9 on the second and 7 on the bottom. The ‘best I found’ layout was derived from a random board on that Qwerty grid (since actually Dvorak and Coleman don’t really fit on a phone), by swapping letter pairs at random and keeping the change if it seemed to work. I think I did 5,000 steps, five or six times. Here’s the layout it found:
D | W | E | B | K | R | I | T | Q | S | ||||||||||
O | J | V | U | Z | F | X | A | M | |||||||||||
C | L | G | H | N | Y | P |
The most obvious thing it’s done is put S (the most typoable letter) in a corner and shoved Q up against it. Another potential improvement to the model is to account for second-nearest neighbours – since flagging an error but correcting it to the wrong thing isn’t much better than missing it.
Another thing it’s done is put all the rarest letters in the middle where they have lots of neighbours – almost precisely the opposite of what Dvorak and Coleman did. Which makes sense, both intuitively and because all the standard layouts are in the worst 5% of all layouts (assuming normal distribution).
Anyway, I think we can all agree this is plainly the best possible keyboard layout for smartphones, and we should name it Taylak and petition Apple and Google to include it as the default for everything ever. I certainly can’t imagine how using the same layout on phones and computers could possibly be more desirable than this.
Here, to end on, is the worst layout I could find, with 204,290 = μ + 8.61σ possible real-world typos:
V | N | T | M | B | G | E | I | J | Q | ||||||||||
Z | S | D | P | C | K | A | O | X | |||||||||||
Y | R | L | F | H | W | U |
Nobody use that layout.