Using Voice Recognition to Play Video Games

Voiced Gaming14 minute read
Drawing of arm with large cracks resting on video game controller.
Drawing by Voiced Gaming

Since 2014 I’ve had very bad RSI, and I’ve been experimenting with ways of gaming using voice recognition. There are many programs available, and every new game needs testing and customization, even when they are similar to previous ones. Here are several pieces of voice and non-voice software I’ve used to make voice-controlled gaming more accessible, with details of their benefits and limitations.

Collection of software icons the author uses including, Dragon Naturally Speaking, Auto Hot Key, Speech Recognition, Voice Finger, Windows Speech Recognition Macros, MouSense, Cheat Engine, Voice Bot, Voice Attack, and Vocola.

1. Dragon NaturallySpeaking

Dragon is very good for typing into non-Microsoft applications, which makes it applicable to many typing games. Using only Dragon I’ve played a few browser typing games, as well as Typefighters, The Typing Of The Dead: Overkill, and – using Dragon to dictate maths notes – Pythagoria.

But there are limitations to this typing. In these games you generally can’t correct misrecognised words, and speech interpreted as commands leads to problems; the worst example is when you’ve tried to enter the word “close”/”closed”/”clothes”, and it closes your game! Misrecognised words are equivalent to random/dropped inputs, though saying “Dictation Mode on” will prevent accidental commands (while blocking use of the phonetic alphabet).

For cursor movement, Dragon has a “MouseGrid” which divides the screen into 9 numbered sections, and by speaking one of those numbers it subdivides that section into 9 further sections, and you repeat this until the cursor is where you want it to be. For small movements, there are commands for moving or dragging the mouse in a direction, though I’ve found it doesn’t recognise those as well as it should. For very small movements, “mouse [direction] [number]” will move the cursor in increments of 3 pixels, with numbers between 1 and 10.

Usefully, its clicks are recognised by all programs. This means that it can be used to play most point-and-click games, though inefficiently.

It has downsides. Dragon is expensive, particularly the higher-end versions which let you make its own macros. It uses more RAM than everything else in this list. And not every version is compatible with each version of Windows, and there are occasional incompatibilities even when it does run.

Still, with a well-trained profile it can be quite consistent.

2. AutoHotKey

This is a free software that lets you queue keyboard, mouse and Windows actions, and having such sequences activated by hotkeys you assign.

AutoHotKey doesn’t involve speech recognition, but its ability to script commands (including loops) is very useful, as you can assign them a hotkey and activate them with speech recognition (e.g. by saying, “Press Control-Alt-G”). I’ve used this to demonstrate rapid clicking in Cookie Clicker and turn pages of e-books and digital comics on a timer. Additionally, AutoHotKey’s outputs are recognised by every game I’ve used.
Its code is a little tricky, and it is possible to trap yourself in an infinite loop, but it’s straightforward to put in exit commands, and you don’t have to go too far into it to be competent at most applicable uses for videogames. The AutoHotKey documentation explains what each command does, though it can make dense reading, and I often need to search for multiple examples.

3. Windows Speech Recognition (WSR)

On its own, Windows Speech Recognition is fairly useless for gaming, because it really dislikes non-Microsoft applications, and seems quite keen to interpret what you’ve said as a Windows command. It’s worth going into its Options and unchecking the option “Enable dictation scratchpad”, which will allow it to directly send recognised words into an unknown program, though with no ability to correct text.

It too has a MouseGrid, which does the same subdividing as Dragon’s, and can be used for dragging things – though it is translucent. However its clicks go unrecognised in a number of games, limiting its use for point-and-click games. This can sometimes be resolved by running WSR as Administrator.

Its biggest benefit is in training its recognition system, because this system is used as the basis for each voice program listed below. The better it recognises you, the better these will perform. Also, it uses a fraction of the RAM that Dragon uses, so it’s much gentler on your computer.

4. Voice Finger ($10)

This is a lightweight add-on for Windows Speech Recognition, which adds two useful modes.

The “Keyboard” mode enables you to send one letter at a time to almost any program, and avoids trying to send words. This is useful in typing games like Words For Evil and God Of Word, which reject whole words and strings of letters from WSR/Dragon, but with Voice Finger I’m able to play them by saying one letter at a time.

The “Mouse” mode creates an impressive MouseGrid, different from the ones listed above. This one subdivides the screen into a grid of 36 x 36 coordinates, and when you speak the coordinate that you want it clicks there (it has further options for dragging etc). This is generally much more efficient than the other MouseGrids, though occasionally programs disagree with it (e.g. Tengami crashes) or ignore it because it uses WSR’s clicks – again, running as Administrator may help. I’ve used the Mouse mode to play some CBeebies games, Doctor Who Dalek Hack, Words With Friends, Hero Of The Kingdom II, Words With Friends, and hocus. It’s generally better at point-and-click games than WSR’s MouseGrid, assuming you are comfortable looking through the translucent overlay and many letters.

I’ve used both modes to help me make VoiceAttack profiles entirely by voice command. Voice Finger is also good for efficiently pressing cursor keys a given number of times, which some games accept.

The key downside in using it is that Windows Speech Recognition is still running, so it’s very risky to narrate what you’re doing without pausing to say “stop listening”.

Overall, I’ve found it well worth the $10 lifetime registration fee, and strongly recommend it.

5. Windows Speech Recognition Macros

This is an official add-on for Windows Speech Recognition which allows you to add custom commands to WSR, including sequences of actions and executing files.

Through this guide I learned how to give it custom voice commands which activate AutoHotKey files representing basic clicking actions (left, right, double, holding down, releasing). This enhances WSR, enabling efficient voice-controlled clicking in almost any program, and I’ve made tutorials about setting this up.

It’s still running Windows Speech Recognition, so you need to make sure you only speak your commands. It’s also very clunky, and it’s better to use Vocola 3 (which I discovered later, listed below), but it was effective once set up.

Illustration of frustrated stick-person wearing gaming headset looking at computer monitor.

6. Cheat Engine

This is a free software that lets you manipulate game variables and game speeds.

Cheat Engine does not involve speech recognition, but is helpful for making games more accessible. Its Speedhack is very useful for slowing down games such that the reaction speed of voice recognition is enough to play them. And changing game variables means you can give yourself as much ammo/health/money/population as you need, or set those variables to inactive so that they don’t change; imagine a health bar or ammo clip or wallet that never depletes.

Usefully, its Speedhack can have preset speeds activated with hotkeys you prescribe. You can then trigger those hotkeys with speech recognition, changing the speed of a game while playing it. This means that if you needed a game at half speed for one challenge, but you can do everything before and after that at full speed, with this you can play most of the game at full speed instead of playing it all at half speed or repeatedly switching window to Cheat Engine. Its other hotkeys relate to its navigation, which can make it easier to find game variables using voice recognition alone, though you are able to find them with Dragon and its MouseGrid.


Note that I’ve linked to an archived version of the website. This is because the main website has been hacked for some time, but the 2014 version is safe and still works. Note also that you mustn’t use this on an online multiplayer game, because those have cheat detection software which will find you and ban you. Here is my tutorial about it.

7. VoiceBot (Also on Steam, £10.79/$13)

This is a customisable voice recognition program, letting you make commands and assign actions to them.

VoiceBot has a fairly intuitive interface, a menu for downloading pre-made community profiles, and even a login function so you can sync your profiles across multiple devices. Its mouse movement, keyboard presses and clicks are recognised by all programs. I’ve used this to play many games (e.g. Squarecells, Golf With Your Friends, Windows Movie Maker), and it runs without Windows Speech Recognition, giving you freedom to narrate and react and not worry about coughing and other talking. It’s also really easy to implement everything previously done in WSR Macros.

Unfortunately it is a bit limited. While it can move the cursor to any given coordinate, these coordinates are always relative to the screen or the current mouse position, never the application you’re using; this makes it tricky to save the locations of menu options as voice commands, and I’ve even had to make a tweak to account for that. Complex actions need to be coded in C# or Visual Basic, and it’s difficult to make and handle variables and loops; it was far easier to make loops in AutoHotKey, and then commands in VoiceBot which pressed those hotkeys.

One big downside occurs when I’ve needed an action repeated many times (e.g. a key press or cursor movement), as VoiceBot requires that I make an individual command for each action and number combination. So when I wanted to say “up/down/left/right” followed by a number from 1 to 30, I needed to make 120 individual commands, all slightly different from each other; though I developed a not-too-slow way to generate these, it remains frustrating.

So while VoiceBot is good for basic-to-moderate profiles, more advanced ones are best done in VoiceAttack. Broadly put, VoiceAttack can do everything VoiceBot can, but not vice versa.

8. VoiceAttack (Also on Steam, £9/$12)

This is another customisable voice recognition program, but it’s been developed for longer than VoiceBot and so can do much more for profile making.

It allows you to make and convert variables (including integers and text strings), run loops, execute and terminate its own commands (useful for loops) and many more things. The variable control makes it easy to group together commands with a common prefix/suffix, instead of having to make individual commands for every outcome; the earlier example of 120 commands can be made into just one command by using If statements and Loops. Its mouse movements can be relative to the screen, the cursor, or any corner of the current application, and its clicks and key presses always work.

With VoiceAttack I’ve been able to “drive” and play Celeste fairly efficiently. It also runs without WSR, so you are free to talk between commands.

Its interface is less intuitive than VoiceBot’s, and I’ve had to look up commands and text tokens plenty of times. Loading community profiles is trickier, and you can’t sync your profiles across devices. Furthermore, relative mouse movements don’t work in all games, sometimes requiring a tweak.

I’d strongly recommend VoiceAttack, though if you are not familiar with coding or macro design, VoiceBot is a good starter program. Once you are more comfortable with macro design and structure, it’s worth moving to VoiceAttack.

9. Vocola 3

This is a free add-on for adding custom commands to Windows Speech Recognition.

I’ve only explored this a little, and found it’s functionally much like Windows Speech Recognition Macros, but without the clunky issues. It’s easy to add local/global commands, move the cursor (discretely, not continuously), input text, use wildcards, run programs, and add new phrases to be recognised (e.g. “new address” could output a multi-line formatted address). Its clicks are the ones from Windows Speech Recognition, so they aren’t always recognised, but like with WSR Macros you can have it run AutoHotKey files to send various clicks (Admin mode might also work).

Unfortunately the code is a bit limited, as you have very few options for manipulating variables. And it needs Windows Speech Recognition running at the same time, so you’ll still have to make sure you only speak your commands.

However it can enable some free implementations of basic things I’ve done in VoiceBot/VoiceAttack. There are also function libraries which can give you extra commands with very specific uses, and the limited scope of the code means that you can learn what you’ll need to in a shorter time (assuming that what you need to do is actually possible!).

I’m aware of Vocola 2, which applies the same code to Dragon and enables a free way to give Dragon macros, but I haven’t investigated it yet.

Combinations

If you are careful with your spoken commands, you can run multiple voice recognition systems at once. VoiceBot and VoiceAttack will only respond to the commands you have given them, so if your profile phrases are specific enough that you won’t say them by accident, you can have Dragon or Windows Speech Recognition active at the same time. This can give you the speed and specificity of custom voice recognition alongside a typing system, by saying things like “stop listening” when you want to use the custom commands.

I’ve used this sort of setup to play Epistory: Typing Chronicles, Letter Quest: Grimm’s Journey Remastered, and more generally in Windows to switch between dictation and more-efficient profiles for e-reading and video editing.

Honourable mention: MouSense (This link – Select “MouSense PC”)

This is a face-tracking mouse, which was available for free, though its website is down and I’ve had to link to an archived version of the site.

MouSense doesn’t involve speech recognition, but can be combined with it. By default, MouSense moves the cursor as you turn your head, with the option to have it click when you’ve stayed still for a bit. I have tutorials about it, and I’ve used this to play some of Hero Of The Kingdom I & II. However its clicking doesn’t always work in a given game (running as Administrator sometimes helps), and when it’s being unreliable with the dwell clicking I end up hurting my neck from holding it still.

But when using this in conjunction with Windows Speech Recognition Macros, the clicking became more predictable and diverse; I was able to play various point-and-click games like Hexcells, Lexica, Puzzle Agent, and Attractor, without my neck aching as much. This principle would apply to any face-/eye-tracking mouse (e.g. Enable Viacam, Camera Mouse, Tobii), and with other voice-activated clicking options.

I’ll be going into the specific logistics of particular videogames in later articles.

Voiced Gaming is an RSI sufferer who researches applications of voice recognition to video games. He makes YouTube videos demonstrating gameplay, tutorials about the software he uses, and some tips about living with chronic wrist pain. You can follow him on Twitter @VoicedGaming.


Enjoy our work? Please consider supporting us!

Donating through DAGERSystem / AbilityPoints with PayPal may be tax deductible

Follow CIPT

Latest from CIPT

(Opens in new tab) starting with