Use Case - Voice Design Technique

Conversational AI is one of the latest tech innovations being applied to business. Conversational AI can be in the form of Chabots and/or Voice Agents. Both forms use a chat handle as a base for structuring conversational dialog.  However, voice agents require a completely different design approach.


One of the primary challenges that designers face when designing voice agents is speech transcription. There are a number of complexities involved when designing around your platform's speech-to-text (STT) transcription accuracy.  Especially when it comes to transcribing unique first and last names, email addresses, and street addresses. In this use case, we will examine a great way to improve the voice transcription for collecting street addresses.


We can all agree that one of the primary customer datasets in our databases is their home or business addresses.  If you are deploying voice in your business strategy, it would be wise to figure out how to capture this important data set with accuracy.


The Challenge

Transcribing a street address via voice bot is a challenge that most conversational designers won't think about until actually faced. Whether it’s pronouncing a street name or spelling it out, the voice agent will struggle to transcribe gracefully.  The more complicated the street name, the more fragmented the experience becomes for the user.  Thus impacting the overall accuracy, user experience, and dependency on human intervention to perform a low-value task.


Great for such constraints:


  • A small percentage of confirmed customer emails
  • Internal legal department uncertain about text
  • Simplify and reduce steps

The Goal

Start by asking the caller for their zip code as the very first item, rather than the last.  Although this feels different, I will address this a little later in the use case on how it can make a difference.   Second, combine an address verification program with a search anticipation function and then develop a transcription matching algorithm to tie it all together.

The Outcome

This technique has proven to improve the accuracy and capture rate for customer addresses better than any other technique that I’m aware of.  I mention this is a technique and not proprietary software.  This means it’s independent of any particular STT technology or chat platform.  The outcome proves to verify an address and confirms the correct spelling while quickly processing behind the scenes.  This simplifies the user experience and prevents the caller from following unnecessary dialog turns to confirm the accuracy.

Demo Voice Transcript

Traditional process


The Voice Bot

"What's your address?"


The Caller

"3861 North Camino de Oeste Tucson Arizona 85745"


The Voice Bot

(Actual Transcriptions from a top-rated conversational bot platform)


1st attempt: "I heard 3811 north Camino day O. S. day 2 sun era zona 85745"
2nd attempt: "I heard 3811 north Camino del Este Tucson Arizona 85745"
3rd attempt: "I heard 3811 north Camino day S. day Tucson Arizona 85745"

Let's Apply a Technique to Repair this Issue

Step 1. Change the voice bot's ordering of the request

Traditional Address Collection

Avoid this format when designing for voice

We traditionally ask for the address in this order:

1. Street number

2. Street name

3. Apt number 

4. City

5. State

6. Zip code (why is this asked last?  This variable replaces the need to request a city and state)

Revised Approach to Address Collection

Use this format when designing for voice

Start by collecting the zip code. Then follow up by asking for their street number and address. Finally, ask if there' an apartment number.

1. Zip code  (This single variable equates to both the City & State)

2. Street number

3. Street name

4. Apt number

Reason why you start with the zip code

There are 2 simple reasons why I do this.  First, asking for the zip code reduces the number of data variables the caller has to reply with.  Not to mention a zip code identifies the city and state.  So in this case, ‘less is truly more’.


Second, the zip code narrows down the number of possible combinations in street numbers and street names within a very specific zone of the world.  This helps the anticipated search function to only look through a limited number of possible matches in street numbers and names.


For example, 85745 narrows the search down to a very specific zone on the globe.  In this case, it’s an area within Tucson Arizona. This simplifies processing behind the scenes because it reduces the time required to filter through the possible number of street addresses listed within a zip code.  Meaning, there are only so many street numbers and address names that could potentially match within that particular zip code.

Step 2. Change up your request for the caller's address

The second step to this solution happens behind the scenes in processing. It’s about leveraging an address confirmation program and an anticipated search function with an algorithm that matches the final result with the caller’s voice transcription of the address.


First, you will need a program that will verify the accuracy of an address. Organizations like the Multiple Listing Service (MLS) in real estate and shipping companies like Fedex use these programs to confirm addresses. Second, you apply an anticipation search function similar to what Google and Amazon use in their keyword search bars. This function helps to narrow down the address as seen below in the animated image. Finally, you must develop an algorithm that enables your bot to match the anticipated search result with the caller's voice transcription.


This also works well with Chatbots

Another Example