by Matt Kelland
8 years ago

Last week, we mentioned the change in Florida law that bans texting while driving. We pointed out that hands-free operation will still be legal, so you can still send and receive text messages as long as you dictate them and have them read out to you. Most modern smartphones support some form of speech input, and many have a “driving mode” which reads out your texts.

Speech input is getting more and more common these days. The idea of talking to computers is something that’s been around in sci-fi movies for some time. The story goes that Gene Roddenberry came up with the idea for Star Trek, because it was easier for audiences to follow than making them read what was appearing on screens. That may or may not be true, but Star Trek was certainly responsible for a lot of innovations we now take for granted (like the iPad, for example), so it’s quite believable. Computer scientists have been working on speech input systems for decades. Early speech software was pretty poor, but finally it seems to be getting to a point where it’s quite usable. Siri may still be a little crude by Starfleet standards (or even HAL in 2001: A Space Odyssey), but it’s good enough to tell you how to get to the nearest pizza restaurant or the current score in tonight’s football game.

However, as well as voice control, speech input is useful for dictation. In many ways, the idea of everybody writing their own correspondence is actually fairly new. As little as twenty years ago, before word processors became common, few office managers or executives wrote their own letters. It was normal to dictate letters, or even longer reports, and then have them typed up by a typist, often in huge typing pools. By the 80s, most people used dictation machines: before then, secretaries had to be there in person to “take a letter”. And, in some cases, the typists were better at spelling and punctuation than their bosses!

For most people, dictation has one main advantage in addition to offering hands-free input: they can speak faster than they can type. It can also be much easier and quicker when you’re using a small touchscreen keyboard such as a smartphone or tablet. Teenagers may be able to type blisteringly fast with their thumbs (even though they risk RSI by doing so), but those of us who learned on real keyboards don’t generally attain that level of proficiency. An average (untrained) typist types at 33 words per minute (wpm) on a standard keyboard. On a touch screen, that drops to 16wpm. We normally speak at around 150wpm, dropping to 100wpm for presentations. Dictating a quick text message such as “On my way home, stopping at the 7-11, need anything?” takes just a few seconds, compared to about half a minute at an average adult typing rate. Dictating a 100 word email can be done in just over a minute, compared to 3 minutes on a standard keyboard or 6 minutes on a touchscreen.

So, if it’s so much faster, why don’t we dictate everything?

There are two main reasons. First, dictation software isn’t quite there yet. It’s not just as simple as turning sounds into letters – computers need to actually understand what we mean and pick the right spelling. For example, the sentence, “its all most knight thyme hear sew eye am going two bed period sea ewe inn the mourning period” makes no sense at all. You can figure out what it was supposed to say, but it’s not really good enough. The software needs to be smart enough to figure out how to turn that into proper English. It’s also complicated by the need for computers to understand our accents. For example, say the word “water” out loud. Most Americans say the letter “t” as more of a “d” sound, so the computer could easily get that confused with the word “warder”.

The software is improving fast, but it still takes time to train the computer to understand our individual voices. In the meantime, it can take as long to correct all the mistakes as it would to type the document in the first place.

The other reason dictation hasn’t taken off is that typing is essentially a silent activity. You can type a message any place without disturbing anyone, which makes it perfect for busy offices, trains, or sitting on the sofa at home. Dictation, on the other hand, is public, and it’s disruptive. Imagine being in a room full of people dictating their Facebook updates, tweets, emails, and texts. Whether it’s the office or the living room, it would be like living in a call center, and it would quickly drive you crazy! And dealing with confidential business correspondence over a latte in Starbucks – totally out of the question.

However, in the right place at the right time, dictation can be a huge time-saver. Learning to speak so that a computer can understand you takes work, but it’s no more work than learning to type on a keyboard or learning to type with your thumbs. Once you do it, you’ll find it second nature.

Try it for yourself – if you have Google Chrome, there are plenty of free apps that use the Chrome Speech system for speech input. Dictation 2.0 is a typical example. You can already search Google by voice just by clicking the microphone icon.

It takes a bit of getting getting used to talking to a computer or a phone, but it seems inevitable that one day soon, the sight (and sound) of someone dictating a letter will once again be common in offices up and down the country.

(This blog post was dictated on a Google Nexus 7 tablet using the inbuilt dictation software.)

