Smart Speaking Across Languages

Are there any issues with translating my voice skill into another language?

Creating voice integrations for large companies with diverse user groups who speak different languages usually means having a conversation about translation. It’s so tempting to take a design created in one language and directly translate it into another language for deployment. Often, well-intentioned arguments about creating consistency for users regardless of their language come into play. To create that consistency, though, we actually can’t do a direct translation. But why not? Why can’t we simply translate one conversation into another?


The most obvious reason we can’t just do something like a Google translation on a VUI design is that the specific words you use and the order in which you use them may not translate. Meaning, you can’t do direct one-to-one translation because it will sound like a foreign tourist asking you how to ride the bus to a popular sightseeing destination. It just sounds…off. The whole point of our latest voice platforms and designs is to create natural-sounding conversations that easily engage people without asking them to do mental gymnastics to figure out how to get their tasks completed. When you have an out-of-the-ordinary sentence structure or phrasing, it creates a heavier cognitive load, and people’s brains have to work harder. (Think: “Where to find the library of the city of New York?”) Users already have to work harder in a voice interface than in a screen interface since they have to remember what’s being said as the device speaks to them. Don’t create an interaction that becomes a brain task and a memory game. People will abandon it — or get very frustrated if that interface is their only option.


So let’s say you address the semantics issue by hiring a translation plus interpretation service. Well done, but you may still need to consider culture. In certain cultures, even if they speak the same language, the culture may be different enough that it may be awkward or inappropriate to use a certain phrase — or even a certain voice — to deliver specific messaging. For example, Portugal’s Portuguese is very formal, and Brazil’s Portuguese is far more colloquial/casual. If you use a Portuguese interpreter/translator, it will be hard to capture the wordplay native to Brazilian Portuguese. If your voice application is meant to be playful, this may prove detrimental.

Likewise, if you are delivering sensitive or personal information (like health information) in a culturally-conservative country, you may have to record the information in either a gender-neutral voice or in male- and female-gendered voices in order to help users feel comfortable hearing it. Otherwise, you may run into issues of people getting offended or shutting off the voice interface because it feels invasive or uncomfortable to them.


Even if you don’t have to translate from one language to another, you may still need to take localization into account. Language is a reflection of the people within the community you’re speaking to, and inclusivity is part of what makes users continue a conversation. That means you have to contextualize the word choices your VUI speaks and understands to accommodate your users. Whether that means regional dialects or phrasings, or using “lift” in lieu of “elevator” in a UK-based app, it’s important to capture the way your users most commonly speak to make the conversation — and your app — as natural and comfortable as possible. Many companies are launching these conversational applications in order to create an easier interface for their users and build up a rapport they can’t create in a standalone GUI (graphic user interface). Don’t work against that by excluding people’s word choices.

One additional thought about localization and inclusion: much like racial and gender bias in machine learning, we cannot script North American-centric conversations and assume those apply across all cultures and peoples. Not only is that inaccurate and can cause a lack of adoption in particular instances, it’s also harmful to the overall adoption of voice interfaces and people’s enjoyment of them. People use the stuff they like. They talk to people they like. If we’re going to combine the talking and the stuff, it follows that we should make it something they like in order to continue the use of them. Assuming that people either think like you or they’re not worth speaking to is not a good way to get them to like your stuff or your product.

Help from non-VUI team members

By this point, I can imagine you may be thinking, “sure, great, but I don’t have an arsenal of resources at my disposal to do this the ‘right’ way, so I’ll have to do it the realistic way instead.” I get it. It’s not always possible to have a staff of people native to English and the language of choice for your application on your specific team.

But even finding other non-designers or developers to help you test your VUIs is helpful. It makes designing and testing in those languages so much easier to do the same internal prototype and QA testing when you have someone who understands the nuances of both language and social scenarios of conversation to move through the conversation and ensure it feels right as well as is accurate to the original intent. There are even some online tools out there that can allow you to usability test for cheap with people in the language you’re choosing.

What if you have to release a less-than-ideal translation?

We’ve all been there. The timing, the resources, something happens that means a less-than-ideal translation is going to market. In some cases, it may be better than forcing someone who doesn’t speak English to struggle through an interface in a non-native language. But consider the blowback that may occur of providing an excellent experience in one language and a subpar experience in another. You may get away with it for a little bit, or you may not.

One band-aid you can try, if you have to release an imperfect translation, is to acknowledge the imperfection with a line in your greeting. You can try something like, “I’m not the best Tagalog speaker, so bear with me.” Or perhaps you can connect to a human resource to help through crucial moments — IVRs often use this trick. Though, if all your human resources only speak one language, absolutely make sure you let the user know the language will change before handing them off to the human. (I can’t tell you how jarring it is to go through a Spanish-language IVR and be passed off to an English-speaking representative without any advance notice.)

Point being…

Whatever you do, know that conversation is a reflection of the people you’re speaking with, and the same detail and care you pay to craft the conversation in one language should be translated to the next.

Want to learn how Grand Studio can help your next VUI design project and build clarity out of complexity?

We’re here to help!