Dragon Dictation News

What role does speech recognition have to play in virtual reality development? Sony, Oculus Rift and the future of virtual reality.


There’s a lot of buzz about virtual reality right now, driven primarily by the development of the Oculus Rift virtual reality head mounted display (HMD). This prototype HMD was developed by Palmer Luckey – an impressive young hardware developer who is largely responsible for the sudden resurgence in virtual reality (VR) technology since the 1990s. Sony is also taking a stab at a virtual reality device called Morpheus, however, it is unclear how advanced or how expensive this will be when it is released to consumers or developers.

I remember the good old days during the 1990’s when the local Virtual Reality Cafe ( that phrase doesn’t make sense to Gen Y, like a lot of things) took my hard earned money from wrangling shopping trolleys and turned it into glorious low-definition stomach-churning disorientation.

That being said, I had one hell of a good time. It was an amazing experience despite its serious limitations. I assumed at the time that within a year or two the technology would be reaching my bedroom for around the price of a PlayStation one. What a fool I was. I should’ve realised that the Lawnmower Man was full of old moldy grass clippings and not the genesis of a brighter polygon shaped future. This is coming from someone who thought Xevious was cutting-edge gaming technology. 

I always wanted more VR and it never arrived. That’s one hell of a disappointment.

But sometimes a brief, bitter-sweet taste is better than none at all.

Consumer VR is coming. Oculus Rift released details of the development kit 2 (DK2) this month at the game developers conference in San Francisco. The development kit 2 is now available for preorder for $350 US plus shipping. The Oculus Rift development kit 2 will be shipping in July. And at some point – probably 2015 – the consumer version of the VR HMD will be available for home use.

Being cutting-edge technophiles at voicerecognition.com.au, I immediately ordered the Oculus Rift DK2 for the voice recognition head office in Australia. Of course this VR device will only be used for serious business purposes and in no way be associated with playing virtual reality demos and games during lunchtimes and after 3 PM on a Friday – ever, I promise…

In light of these recent developments, I have a question for Sony, Oculus and VR software developers. This question  relates to the other problem with VR. Using a VR HMD immerses the user in a virtual world, but the user then needs to naturally interface with that world without breaking this immersion. This is a huge challenge for VR developers.

My question is:

“What role will speech recognition play in improving the virtual reality experience?”

Once virtual reality becomes effective, popular and accessible for the masses – Palmer Luckey claims that this goal is his primary objective – virtual reality users will want to purchase third-party devices such as the Razor Hydra to help the navigate their amazing new three-dimensional worlds.

However, until hand and full body tracking become seamless and affordable, it will be incredibly important for people to be able to interface naturally with their new 3-D virtual environments. Even when motion tracking becomes cheap and effective, some people will still need to perform certain actions that are unsuitable for motion tracking. And some people simply do not have the ability to move easily because of injury or disability. Or they just might not want to move around at home while enjoying their virtual environment. They might be relaxing, drunk or just  tired. There’s a multitude of different reasons why  motion tracking is not a viable option for all VR users.

Presence is considered the most important component of virtual reality. It’s that feeling of “presence” that’s really convinces you that you are in a virtual world. You must be fully immersed inside of the program or game to be convinced that you are there. If you have to stop, take off the virtual reality HMD and start pressing buttons on the keyboard to perform an action – for example, to go through your inventory or lower your landing gear – then that breaks your link with your virtual environment and destroys presence.

VR’s not as convincing if you have to stop and press buttons to take certain actions. One minute you’re flying at 52,000 feet in an F-22 tracking a Chinese J-20 stealth fighter in a dogfight – seconds later you’re staring at the three-month-old vomit stains your dog made on your living room carpet after you feed it Cheetos and beer (don’t ask). Then you reach down to press the Shift-M key to switch to Infrared homing missiles when your mum says “It’s about time you got your face out of that helmet and met an actual girl, don’t you think Junior?”

It would have been better to say “switch to heat seekers” than to deal with this humiliating real life exchange involving dog vomit and your mum.

I accept that some button pressing is inevitable – that’s why game pads are an acceptable input device for now – however as a technology advances people will expect a more immersive experience. I have personally emailed Palmer Luckey about support for my 31 year old TAC 2 joystick – so I’m not anti-joystick or anti-gamepad. He said he’s working on it, but with an installed base of five dudes in their forties (who all coincidentally know each other and have beards) then it’s not looking too good.

Speech recognition plays a very important role in bridging that gap between full immersion and a gamepad or keyboard interface.

Iron Man doesn’t reach for his gamepad when he is hurtling through the atmosphere, he merely utters some commands to Jarvis who executes his request after processing his speech through a voice recognition application of some type. That’s how a boss rolls.

Iron Man does not fumble for the X button. He uses  speech recognition.

There is a huge market for natural input devices that interface seamlessly with virtual reality. The Myo armbands are a good example, and I have already mentioned the Razor Hydra. There are a number of other new technologies that will develop in parallel with virtual reality devices over the next few years. All of these will improve the interface between the user and the VR environment – enhancing presence.

Speech is considered to be the most natural way of communicating, so should be considered seriously by any developer of virtual reality software.

Dragon Naturally Speaking can be provided as a Software Development Kit. This SDK could allow game developers to integrate speech recognition into the game or virtual environment.

This would allow users to interface naturally with the game by simply speaking commands. You will probably not need lot of commands like you normally do using speech recognition. By only having a limited number of commands it will make it easier to remember what they do, and it will also make the processing faster as the speech recognition engine does not have to look up too many different actions.

You would simply need a context relevant command set, as it relates the game you’re playing. And it would be very cost effective to integrate a small digital USB noise cancelling microphone into the virtual reality headset mount display. Then you’ll have high-quality digital voice input into the game. This would make the speech recognition very accurate. Even better, if it was integrated into Steam by Valve, your voice profile could reside on the Steam server. That would mean that your accuracy will improve while you’re using speech recognition in any Steam game. Over time you will develop a very accurate profile of your voice – stored in the cloud – that will be accessible in any Steam VR program. That means you would not need to train the speech recognition application ever again. This would save a lot of time and increase user acceptance. This is similar to using Dragon NaturallySpeaking on the iPad and iPhone where a profile of your voice is kept on the Nuance server and recognition improves over time. Siri also works this way – your speech profile is linked to your apple account and accuracy improves through use.

So instead of flapping your arms up and down when you’re ready to take off in your F-22 Raptor, or pointing your finger-guns in the air while making the “peow-peow” motion in team Fortress 2, you might just be able to say “take off” or “switch to shotgun” and get on with enjoying the game.

Speech recognition has a valuable role to play in the Virtual Reality world. Let’s hope company’s like Nuance, Oculus and Valve can work something together to make speech a cost effective, efficient mechanism for VR input.

Russell Bewsell is the managing director of Voice Recognition Australia. An Australian company that has been supplying voice recognition software to the Australian market for 14 years. Russell is an avid gamer and technology enthusiast, who may have started his gaming career on the Haminex, Atari 2600, Amstrad and Commodore 64.


Leave a Reply

Your email address will not be published. Required fields are marked *