The Computer Telephony Portal

3/22/2005

Two steps forward..

Filed under: — Ian @ 10:52 am

Seen in the news this week.

  1. Northumbria switches to speech recogntionTo reduce turnaround time in the one million pieces of patient correspondence, a dictation system is being piloted. It works either as real-time dictation, or batch-mode (record an audio file which is ASR’d later).
  2. Software as a career-saverAn economics professor in Canada who lost the ability to type, now uses his voice to do text input.
  3. Dutch integration exams under fire. This is an odd one. The Netherlands is set have immigrants take a Dutch language and culture test. The test will be done using voice recognition technology. Does this mean that NoRec errors will be used as an indication of a poor Dutch accent? One certainly hopes not. Definitely one story to watch.

3/14/2005

Password reset using speech

Filed under: — Ian @ 3:02 pm

“Password resets are the second most common reason workers call help desks, accounting for about one in four help desk requests”

Hmmm, I wonder what the first most common reason is. Anyway, password reset is another task that can be well handled by speech — simple, focussed, and repetitive. There’s a demo at this link that you can call into.

It was mentioned in W2KNews this week, a popular newsletter for sys-admins. “Pretty kewl” indeed.

2/14/2005

Everything is a ListBox

Filed under: — Ian @ 2:53 pm

The structure of applications follows the type of user interface used. The first interactive apps in PC-DOS days were text-based console apps. An application would ask the user a set of questions, one at a time, ending with the ubiquitous “Are you sure? Y/N”. A poor user who mis-typed an item would press “N”, and have to fill in the list all over again.

GUIs gave the initiative to the users, who could fill in (or not) fields in any order. Validation could occur on each item as it was entered. A big step forward, if you had a PC handy.

VUIs (voice user interfaces) are a whole different animal. In a VUI you must activate the grammar before you ask the question. It must contain all the possible answers. This complicates the user interface because some data values are open-ended. Consider getting a mailing address from the user. The State is easy; there is a fixed set of them. Zip code is more open-ended but there is still an underlying pattern (5 digit number in the US, AlphaNumAlpha-NumAlphaNum in Canada) that can be used to create a grammar.

Street addresses are completely open-ended. It has, if you’ll pardon the pun, a large address space. Here’s a sampling

  1. Dr. Martin Luther King Jr. Ave is the longest street in Albuquerque
  2. Ho Road in Carefree, AZ meets Hum Rd at the corner of Ho and Hum pic
  3. Akaaka Street is in Oahu
  4. not to mention the dreaded Welsh names like Gwernymynydd

The only feasible approach for getting a street address is divide-and-conquer. Ask the zip code first and then, using census data, have grammars for every zip code. Suddenly, your simple feature of getting the caller’s address requires determining every street name in the country! As discussed before , this is a perfect job for third-party speech objects.

The structure of speech application code reflects this issue. Much of the ‘validation’ code of GUIs now becomes grammar generation code that runs at the start of the dialog. When the speech dialog ends, there’s not much validation to do since the user was picking from lists that we generated. Of course, dynamic grammar generation creates problems its own: caching and avoiding unnecessary grammar reloads…

Apps that do well in an ‘everything is a listbox’ world are ones that already know about the user. Existing customers call in, enter an account number, and the app already knows their phone numbers, address, GPS co-ordinates.

2/8/2005

“Play ‘The Wall’ please”

Filed under: — Sonia @ 3:15 pm

Back in October I pointed to the increasing use of speech-enabled navigational systems in cars as evidence that speech recognition is already a mainstream technology.

Here is another piece of evidence: speech will be used in media players as well.

According to this article, users will be able to use voice commands to instruct their digital media players which song to play next. Here is a short transcript:

“Music library firm Gracenote has teamed up with Scansoft to offer a control system that hopes to give people hands-free access to their digital music collection on the move and make the need for thumbs a thing of the past.
……
Targeted products include car entertainment, portable media players and home entertainment devices such as media servers. The companies estimate that fully-integrated solutions for hardware and software platforms will be available in the fourth quarter of 2005.”

How many of those devices are out there? According to IDC, iPod sales alone will rise to 25.5 million units in 2008. Not to mention the number of cars on the road, entertainment units the homes….you get the point. Speech is everywhere.

1/28/2005

Microsoft Speech Server Wins Analyst Award

Filed under: — Chris @ 10:02 am

Congrats to the Microsoft Speech Server Team!

Their quality work has been recognized not only by a growing number of customers, but also by independent market observers. Frost & Sullivan, a leading industry analyst firm, chose Microsoft Speech Server as its 2005 Enterprise Infrastructure Product of the Year.

“The Frost & Sullivan Award for Product of the Year is presented each year to the company that has demonstrated excellence in new products and technologies within its industry. The recipient company has shown innovation by launching a broad line of emerging products and technologies. This product’s launch has been recognized by the entire speech industry as one of the most significant events of the year. This is a tremendous recognition given the strong competitionand highly innovative spirit in this emerging market.”

Frost & Sullivan concluded their analysis by saying “…that Microsoft is perfectly positioned to have the greatest impact on the speech technology and solution marketplace since the introduction of speech technologies. With its substantial marketing and sales resources, large customer base, and strong technology
expertise, it will be able to successfully promote its product, increase customer awareness, and drive the growth of the entire market”

I couldn’t have said it better myself ;->

You can read the full press release here.

I have posted the analyst report on Microsoft Speech Server here as well.

Chris

Powered by WordPress