The structure of applications follows the type of user interface used. The first interactive apps in PC-DOS days were text-based console apps. An application would ask the user a set of questions, one at a time, ending with the ubiquitous “Are you sure? Y/N”. A poor user who mis-typed an item would press “N”, and have to fill in the list all over again.
GUIs gave the initiative to the users, who could fill in (or not) fields in any order. Validation could occur on each item as it was entered. A big step forward, if you had a PC handy.
VUIs (voice user interfaces) are a whole different animal. In a VUI you must activate the grammar before you ask the question. It must contain all the possible answers. This complicates the user interface because some data values are open-ended. Consider getting a mailing address from the user. The State is easy; there is a fixed set of them. Zip code is more open-ended but there is still an underlying pattern (5 digit number in the US, AlphaNumAlpha-NumAlphaNum in Canada) that can be used to create a grammar.
Street addresses are completely open-ended. It has, if you’ll pardon the pun, a large address space. Here’s a sampling
- Dr. Martin Luther King Jr. Ave is the longest street in Albuquerque
- Ho Road in Carefree, AZ meets Hum Rd at the corner of Ho and Hum pic
- Akaaka Street is in Oahu
- not to mention the dreaded Welsh names like Gwernymynydd
The only feasible approach for getting a street address is divide-and-conquer. Ask the zip code first and then, using census data, have grammars for every zip code. Suddenly, your simple feature of getting the caller’s address requires determining every street name in the country! As discussed before , this is a perfect job for third-party speech objects.
The structure of speech application code reflects this issue. Much of the ‘validation’ code of GUIs now becomes grammar generation code that runs at the start of the dialog. When the speech dialog ends, there’s not much validation to do since the user was picking from lists that we generated. Of course, dynamic grammar generation creates problems its own: caching and avoiding unnecessary grammar reloads…
Apps that do well in an ‘everything is a listbox’ world are ones that already know about the user. Existing customers call in, enter an account number, and the app already knows their phone numbers, address, GPS co-ordinates.