I recently found (through High Scalability) a very interesting interview with Brian Roemmele, an experienced engineer and advocate of “Voice First” – the idea that voice is going to be the primary means for people to communicate with technology in the future. The interview is fascinating and somewhat strange; most of all, it brings to mind an old article I’ve read in the past by Ryan Britt about how Star Wars seems to describe a post-literate society – That is, their society seems to rely on voice and hologram technologies to such a point that almost no character is ever seen reading anything, and there are interesting ideas about how non-literacy (or post-literacy, in that case; i.e a society where literacy is known and has existed, but has been willingly abandoned by at least the majority of population) might explain some strange elements of the movies’ plot. Is it possible that Roemmele’s ideas are taking us in the same, post-literate direction?
(You can listen to the interview here, or read the main points in the High Scalability article. Britt’s article seems to have been taken off the web, maybe to encourage people to buy his book instead; there is a follow-up article here. Also, while preparing to write this post, I found what appears to be another article suggesting the same idea already in 1998, by David Lance Goines)
First, a few words about how I see Roemmele’s idea of Voice First. It seems extremely foreign to me, and I am obviously not in the target audience of what he describes; However, I’m open to the possibility that I am the minority, and what he says might be more relevant for most people. Is it true that “Anyone trying to type has to first put it in a voice in their head before typing”? I certainly type faster than I speak. I certainly don’t spend ninety percent of my time sifting and sorting Google results (and not just because I use DuckDuckGo instead), and when I hear that when he touched an iPhone for the first time, “little hairs went up on my back” – I can only conclude that he and I are very different people.
From my perspective, his advocacy for voice seems to hinge on three basic principles: efficiency of text-based lookup compared to menu-based lookup (it’s quicker to say “text Brian” than to find a texting app and choose who to text), efficiency of voice as an input device compared to a smartphone virtual keyboard, and the promise of AI. For me, the first one is basically the same command line concept we’ve had long ago – he even mentions it as an example of an obsolete system, but really, what he’s suggesting is a voice-based, AI-powered command line. Which is fine, since someone like me indeed still uses command line occasionally – that’s why an important part of being a Windows user is learning to use the Win+R shortcut. I never use the Windows start menu – any program I have that’s more than one click away, I open by writing its name on Win+R. As for the second point – I’ve disliked smartphone virtual keyboards from the first day I saw them. We certainly need a better input system. Personally I’m skeptical about voice being this system, but whatever. And as for the third – I’m becoming more AI-skeptic every day, and that’s too big a topic for this post. If you want to bet on the AI bubble as being the future, good luck.
But let’s return to our topic. He clearly represents more people than I do. My way of using a computer is tightly connected to my being a programmer and a gamer; I can see that people who are neither of those things do tend to have a taste more similar to his. So is it true that voice is the future, and will it bring the post-literate, Star Wars-like society? This point is not mentioned in the interview, but I think Roemmele’s vision leaves very little need for literacy. He says we’ll still be looking at screens occasionally, but it will be rare. So I think we should really stop and think about why people started reading and writing, why they still do it now, and why should they do it in the future.
The main, if not only, incentive to read, is to have access to more information. In the pre-voice-interface world, our only way of getting human-made information without that human being physically next to us and talking, was to read. We would read books, newspapers, and websites, and thus get information. Books and newspapers have been gradually shifting to digital screens in the past few years, meaning that replacing screens with voice interfaces can make almost all of our reading optional. At that point, how strong will the incentive be to learn to read? We can only guess. You might think it’s exaggerated to imagine a return to illiteracy, but it’s important to realize how much of a guess that is – this voice-first future will truly be a new situation.
Because remember, we cannot think about this using ourselves as an example – we might look at ourselves and feel like we (for the sake of argument) use voice interfaces and audio books, yet still want to read occasionally. But we’ve already learned to read, and we did that when we had a strong incentive to do so. What happens with the first generation that has audio books already before they can read? They will not have as much of an incentive to learn reading. They will not need reading to get the information. Will reading still be useful to them in the long term? I absolutely think so. But will that be enough to convince them to go through the hard work required to learn reading?
One very symbolic moment in the interview is when Roemmele asks “Who uses mice anymore?”. I think the mouse, in many ways, is a small example of the same process at work here. The mouse is significantly more efficient than the touchscreen, but the touchscreen has one advantage – it’s intuitive. The mouse, when using it for the first time, is not efficient; In the hands of an experienced user, it becomes significantly more efficient, compared to the touchscreen, which stays mediocre no matter how much you use it. Sounds familiar? This is exactly how writing works. It is not intuitive, and requires a lot of practice to master; but once mastered, it provides huge benefits over voice (which is more intuitive). If people really don’t use the mouse anymore (and by the way, don’t they? I haven’t really seen people like that, but I’m sure he knows the market more than I do. Might also be related to the fact that I am not located in the USA, which seems to be the early adopter for most of these tech trends) because they prefer the short term benefit of an intuitive interface, then can we really expect them to spend difficult hours learning to read and write, when they can just listen to audio books?
So bottom line – if Roemmele’s thesis is correct, I would think it is a very real possibility that our current (or very near future) rate of global literacy is going to be a historical peak; it will only go down from there. Not that literacy will disappear from the world completely, but it will no longer be the near-universal skill it is today. I’m not a fan of “those horrible younger generations” kinds of pessimism – as far as I’m concerned, a transition to post-literacy will be fascinating. I think it will be a bad decision for those who do it, but I have no reason to complain. If that really happens, I’ll fully enjoy my ability to demonstrate reading and writing as a party trick to my future grandchildren’s friends. I doubt if they’ll be too impressed, but who knows.
And some advice for you readers – if the world is going in a post literate direction, I strongly recommend you go against the trend. Not because literacy is some sort of magical wonder world as some people like to describe it, but it’s just a useful skill, even in a world of audio books. I admit I haven’t given much of a chance for audio books because I cannot even start this strange experience of reading a book at someone else’s pace. Even as a computer interface, it seems absurd to me to return to the command line – Roemmele’s vision seems to imply that we abandoned the command line because we don’t want to read and write too much, but I have a very different way of seeing things – we (mostly) abandoned command line because it’s a one dimentional medium. You can only read or write one thing at a time, even more so with voice than with text. On a screen we can get much more information at the same time, and be more efficient.
And finally, I’ve mentioned David Krakauer’s concept of competitive and complementary cognitive atrifacts before, and I think it’s relevant here as well – always prefer technologies that improve your intelligence rather than technologies that compete with it. Text, computer mice, keyboards, long division and maps – these are all technologies that not only help you, but you can understand them and learn to internalize them without depending on some company to provide your thinking for you. With your digital assistant – I hope you’ll enjoy this life where you can suddenly forget how to turn off the lights in your house (actual story from the interview). Or to use the Star Wars analogy – where you can be completely unaware of a dark lord taking over your republic.
 Or something along those lines. I admit I have very limited knowledge of the Star Wars universe.