A Conversation With Brian Roemmele On The Voice-First Future

A wide-ranging Q+A with tech’s voice expert

Dec 08, 2016

A couple months ago, I decided I wanted to learn more about how we’ll use our voice to control our computers (think Siri, Amazon’s Echo devices, etc.) I tweeted @basche42, tech Twitterer extraordinaire, about who the best person to follow in this field was. He didn’t hesitate — Brian Roemmele.

Roemmele’s analysis has appeared in Newsweek, Forbes, Business Insider, among other outlets. He has a background in payments and currently works as an advisor and consultant. He’ll also tell you that he’s a scientist and a lifelong student. Whatever his business card may say, he has a depth of knowledge in voice and emerging tech that is crystal clear if you’re one of the many who follow his Twitter feed or read his Quora posts.

I set up an interview with Roemmele and carefully worked on a list of about ten questions. I did my homework and wanted to make sure that I could speak his language, or at least fake it enough so that I could get him to open up. I shouldn’t have been concerned about that. Because he talked and talked and talked. (And he warned me he would!) In one of his writings, he mentions that he has an 800 page manifesto about voice. One of my questions was going to be about whether that was real, but after our discussion, I didn’t really have to ask.

Our conversation was an hour long and spanned many topics. I’ve edited it down to the best bits. Hope you enjoy.

How do you answer people who say that it’s quicker for them to type or use their phone to complete a task than it would be to use their voice?

The computer started with punchcards. The punchcards could very quickly feed information into a computer — ten times faster than a human could type. So, during the time of the punchcard, if you and I were hanging out with a scientist, these guys at Stanford, Harvard, and MIT were saying that they were never going to type on a typewriter. “I’m not going to lower myself by touching a keyboard. And don’t you tell me a keyboard will be faster than my punchcard.” You see, they didn’t lower themselves.

Punchcards were pre-keyboard. This kinda looks like a ballot from a confused voter.

Everyone who debates me about voice doesn’t know their history. A keyboard is unnatural. It’s unnatural for us to use our brains to come up with ideas and use our fingers to somehow express these ideas through keystrokes. It’s a cognitive load. You study the human brain when you’re trying to type versus when the human brain is in flow, there’s no comparison. It’s the difference between a donkey pushcart and a rocket.

So basically, humans have had to adjust to computers instead of the other way around?

Exactly, which I mentioned in my “Voice-first Revolution” article. Kids get it instinctively. My children expect to talk to a computer, they expect to touch a screen. They don’t expect to touch a keyboard. The keyboard is immaterial to them. When the touchscreen came about, the death of the keyboard began. If we look at the arc of history, people will say that it started with Steve Jobs in 2007. And will screens be there? Yes. Will keyboards be there? Yes. We’ll just use them a lot less.

People assume the keyboard is a natural way to interact with computers, but that’s only because the computer wasn’t powerful enough to understand what we said, and more importantly, what we meant. We couldn’t even get voice-independent voice recognition down until the mid-2000s. It wasn’t there yet. Because the cloud wasn’t there.

How did it come together?

Voice recognition took off on the same line as [Amazon Web Services]. At the same time, artificial intelligence and machine learning didn’t make sense until the mid-2000s. Cheap hardware didn’t make sense until the mid-2000s. So if you want to see the inception point, slowly the pieces started coming in just as the iPhone was arriving in mid-2007. Then Moore’s Law took over. And network effect took over. And AWS started getting less costly. And then we move into 2014, when Jeff Bezos had a team at Lab 126 who were supposedly making a “talking Kindle”. Because Kindle speech synthesis was starting to take off in 2005. The scientists who were doing it were rebellious pirates. Unfortunately they didn’t have a Steve Jobs character.

What would someone like Steve Jobs have done?

If Steve was there, he would’ve had the Xerox Palo Alto Research Center moment, when he discovered windowed computing, networked computing, and the mouse. Steve was a magician. He knew how to take these things and weave the story. Evan Spiegel over at Snap is a version of that. And people think that’s insane. But we’ll see in ten years.

Evan Spiegel could be a version of Steve Jobs?

I already know it, I don’t need anyone to tell me. And it’s not arrogance. You throw a rock out a window every day of your life, after a while, you don’t need to test gravity to know it works. There are certain gravities within humans and how they present ideas. And so far, unless someone really sidesteps this guy, he’s on that track. Snap is becoming more of an Apple. I live in Southern California, and I can tell you, what they’re doing in machine learning and artificial intelligence is absolutely mind-boggling. They’re gonna be a hardware company. They’re gonna be a hardware company-well, I don’t want to go too far.

Have you consulted for them?

Unofficially. [laughs]

Do you feel that the Amazons, Apples, and Googles understand voice-first?

Here’s what the highest-ranking Google VP said:

“The days of the top three pay-per-click searches and the ten organic searches are over.”

What that means is that this is not a novelty. They have a 1500 person army over at Amazon. Jeff is not stupid. He’s not failing on Alexa. Wait till you see the numbers when they break out how many millions of dollars in transactions came through on Echo devices. During the holiday season, some of them were [using Echo devices to buy more Echo devices].

So who’s leading the voice-first race?

None of them. None of them. When Steve invented the iPhone, he invented something most people miss. Everyone saw the touchscreen, the smartphone idea. Everyone later saw apps. Remember, the iPhone wasn’t delivered with an app economy. It wasn’t well thought-out. But what they didn’t really see is that he busted the entire modality about what a cellphone was about. The controlling interest of cellphones were logically the cellphone providers. Steve reversed all this. He did the same thing with music companies. He was seeing the grand vision. He pulled music companies out of the abyss. We’d be trading pirated music right now [if not for Steve].

None of them are in the position because they believe that a voice-first device is an intelligent agent and an intelligent assistant. Those are two radically different things.

What about a service like Viv?

Yes and no. Because they actually get it. Dag [Kittlaus] is extremely brilliant with this stuff. He’s made two companies [Siri Inc. and Viv], and Apple has lost Dag twice. The last act Steve did as a corporate leader was to acquire Siri. He told Dag that Siri was more important than the Macintosh and the iPhone. And Dag won’t say this publicly. If Steve was around, he’d sound more crazy than me right now. Steve would be laughing.

How will this affect search?

We don’t really want to look at lists of things. 1883 — telephone directory invented. 1999 — Google. And it’s exactly the same thing. It’s just coming to you electronically instead of printed out. You’re still sifting and surfing. How does voice change that? First of all, it’s not just voice. It’s AI and machine learning behind the voice. What we really want is a yes-or-no answer most of the time, or we want the best result.

An intelligent assistant who’s on your shoulder for most of the time will know more about you than any significant other will ever know. They may know more about you than you know yourself. Tell me where you want that intelligent assistant to reside. Do you want it to reside in someone’s cloud who’s mining it daily for tidbits? Or do you want it locally, in a box that you’d give limited access to for different parts, but you know when it comes in and it comes out. Look at the world right now. Does anyone understand this?

So Echo and Home, those aren’t it.

Those are calculators before the PC came out.

We don’t know how far we’re gonna go until we get there. But if you’re a student of history, you can see the arc of things.

So it may be Amazon and Apple, but it could be someone else.

The companies that will be the Apples and Googles and Amazons of this next generation — we don’t know their names.

It’s going to be in a box maybe as small as a pack of gum. It’s going to be in a box that you physically have control over, that nobody will get access to. And your intelligent agent is the mediator between your intelligent assistant and the dumb pipes.

Steve Jobs called the cellphone providers “dumb pipes” off-record. That’s how that word came about. We’re gonna put our abstraction layer on top of your dumb pipes. We’re gonna use your dumb pipes — Siri, Alexa, Viv — and we’re gonna put our abstraction layers on top of your dumb pipes. Who’s gonna win the war? We don’t know the names yet.

The winners are DIY individuals. We use Watson to do intent extraction from Siri, which talks to Assistant, which will tell Alexa to do something. We don’t care about the silos that they made. We surf on top of the silos. And then we interact with all the different AI systems that are APIs out there. But someday, we don’t even care about APIs. Because the most powerful API is our voice. So the intelligent agent is a small gear on a big gear that we call our personal assistant or agent.

This is exactly what the personal computing revolution looked like in 1975.

Thanks for reading! You can find me on Twitter @adamokane. Hit the ❤ if you liked the piece.

Adam Report

Discussion about this post