JottIn September of 2006, the world was introduced to
Jott, a web service that transcribes your voice notes into text and sends them to different places like Twitter, Facebook, or your calendar or blog. It’s a really neat service.
Here’s how it works. When you sign up for your Jott account, you’re given a phone number that you can call to leave your notes. The service answers the line and says one simple question - “Who do you want to Jott?” After you speak a keyword indicating the service you’d like to send the note to such as “email” or “Sarah” you are asked to speak out the note and they take care of the rest.
Shortly after joining Jott, back when there were free accounts, I noticed a problem. As soon as I dialed the number and Jott answered, that question always threw me. “Who do you want to Jott?”. It isn’t that it’s a particularly confusing question. In fact, I bet the team over there spent a lot of time deciding exactly what the question would be and how quickly it would need to be said in order to get to the point as quickly as possible. But every time I’d call and they’d ask, I’d stumble.
There were 2 reasons.
The first has to do with knowing exactly what to say - while there are plenty of great examples throughout the Jott site, when I’m on the move I have trouble remembering exactly what voice commands it can take. So, I pause for a second. If it were a human on the other end of the line, I could say something like - “Oh crap, I can never remember that.” But, a computer is much more touchy. No matter what I say, Jott thinks it’s a command. So I freeze.
The second reason I’d screw up had to do with the context of the situation. You see, although the transcription service was sometimes human-aided, it was far from perfect. And I knew, from reading around, that some of the people doing the transcription were not native English speakers. So, whenever I’d call Jott, I’d be thinking 2 things - “get soup at the grocery” and “enunciate.” Unfortunately for me, 2 things is about all I can keep in my head at once, so when the question came up asking me where I wanted to put this particular note, I’d mess up.
What to do?
Other than fixing my brain to deal with more things at once, I see 2 simple answers to this issue - a) better recognition software so I don’t have to worry or b) ask what before where. I think the latter is a bit more interesting.

All the time we deal with interfaces that ask you for detailed info before you get to the meat. For example, in almost every mail client, we’re asked who the email is to and what the subject is before we get to the actual message. But the real world doesn’t work like that. How many times have you started a letter by filling out the envelope? Not many? Of course not. Given the unconscious choice, you’d rather get straight to the message, then deal with the details.
Or what about when you start a new document. One of the first things we’ve been trained to do is save it so that if the program or computer fails, we’ll at least have something. But in order to do that, we have to give it a name and put it somewhere. Yet, with a pencil and paper, I can just start writing and worry about where to put it later.
Look around, I’m sure you’ll notice more.
I think as software designers, it might be an interesting idea to look at where we can flip this around. While it may defy convention to have an email client with “To:” and “Subject:” fields at the bottom, I’d be interested to see the user reaction once they got used to it. It might just be something bothering us that we haven’t noticed yet.