You may read about "speech recognition," voice output," or something else that leads you to think that you speak into the device and out comes the spoken translation. This isn't exactly how it works. But we can come awfully close.

With the LUX translator, the device uses cloud computing via Wi-Fi to handle the massive computing requirements for approximating speech to speech translation. The TLX translator has this feature but you can also access the cloud via a data plan because the TLX is a phone with SIM cards.

The TLX and LUX speech to speech feature uses your speech as a voice command to look up words in the cloud translation dictionary. On those fast cloud computers, the system transliterates the string of words you just spoke, and then it applies rules of syntax to produce a spoken translation on your TLX or LUX device. It all happens very fast, and what comes out is often, but not always, an exact translation.

Where you can't get a signal, you have offline translation tools also available. Those are in the TLX, LUX, 900-series, and iTravl translators; they work very well and are actually quite fast and easy to use. These devices also have other uses for speaking into them, such as showing you how well you pronounce things when you use the Language Teacher program. We look at those other uses, below.

"...the natural ways in which humans use language and the variable contexts of speech are extremely complex, so much so that programming a computer to truly understand us has been, and will for a long time remain one of the greatest challenges in all science."
-- David Wolman, Righting the Mother Tongue, 2008
Speech recognition in iTRAVL Language Teacher program Speech Recognition in Language Teacher

These translators use speech recognition in very clever, useful ways in the Language Teacher program.

For example, you are in Language Teacher learning words in the other language. It will show you the word and pronounce it. You then pronounce the word yourself. Language Teacher will grade your pronunciation, telling you things like, "Try again,' "Not quite," "Not bad," or "Excellent." You'll see a graphical comparison, as well. Way, way cool.

But that's not all. After you learn a few words, you'll be presented with an image. Above it will be the words you just learned. Now you have to pronounce the word that describes the image. And this is just the tip of the proverbial iceberg in describing what these translators do with speech recognition.

Speech recognition in Ectaco iTRAVL Audio phrasebook   Speech Recognition for Translating

In short, there is no such thing as a "voice to voice" or "speech to speech" translator on the market. The technology for such a thing in a pocket-sized device that doesn't run on a pair of car batteries or super-long extension cord just doesn't exist. In fact, the technology to do it at all just does not exist. We do have the technology, via the cloud, to transliterate (translate each word in the order spoken) your spoken words and then apply rules of syntax to approximate a translation in the other language. It works extremely well, especially if you think before speaking! But it can't accomodate axiomatic speech or mispronunciations.

You may have seen a video demonstration and gotten the impression that the iTRAVL does "speech to speech" translation. That's not actually what you saw. What the video shows is the ability to select a phrase that you are already looking right at, by speaking it into the device. It's faster to just tap it with your fingernail.

So, why are people talking into these devices in the videos and live demonstrations? Well, we have sold these devices to police officers, social workers, and others who have taken the time to become familiar with the phrasebook and have a fair idea of what's in it. They can quickly tap to the category they want and select a phrase by speaking it. In short, they know what's there and how to quickly get to it. With a little practice, so can anyone else.

 

 


It doesn't take long to have this ability, primarily because of how the phrases are organized. You can look through the corresponding phrasebook section before going out with a device in a particular situation (e.g. to the bank). Knowing the kind of phrases (e.g. "Where is the nearest bank?") will allow you to say the ones that are in the device.

To many people, this is "speech-to-speech translation." But, technically, it's not. It's using the audio phrasebook feature (provided by the Ectaco 800, 900, and iTRAVL pocket translators) to use pre-translated speech for specific situations. The kicker here is you have 14,000 phrases covering a wide range of typical situations (7,000 per language pair, meaning that on a 19-language device you have an amazing 133,000 phrases that you can, even more amazingly, get to quickly.

The situations include Basics, Traveling, Hotel, Local Transport, Sightseeing, Bank, Communication Means, In the Restaurant, Food/Drinks, Shopping, Repairs/Laundry, Sport/Leisure, Health/Drugstore, and Beauty Care.

If we're talking about being able to pull up canned phrases that fit specific situations, then, yes, there is "speech to speech translation." But it isn't simply talking into the device and then whatever you say gets translated.

Translation is a complex endeavor, and doing it on the fly in a portable machine is beyond current capabilities. There are many reasons for this, and to understand them you merely have to be present when a human interpreter goes back and forth between two parties. Now imagine that with a machine that can't make eye contact with people and two or three people talking at the same time.

So, yes, you can speak specific phrases into a device and get a translation out. The unit will try to match what you said to what's in its internal database. In every implementation today, that's the phrasebook--this is significant, because it means you have to exactly match what's in the phrasebook. It actually does work, but only in this way.

The device does not transcribe what you said to the screen. Instead, it looks up what its program says matches what you said. Therein lies one of the big problems; the device can be way off the mark. Using text entry solves that problem (text entry is very easy with the current generation of iTravl and 900-series translators)

All of the units with this feature perform speech input in English. Many also recognize speech in the target language: Spanish, French, German, Russian, Chinese, Italian, Portuguese, English, Polish, etc. A subsequent step is having the unit pronounce it, if you so desire.

Full text translation in Ectaco iTRAVL devices Voice output for Text Translation

You don't have to study foreign language grammar, conjugate verbs or search for coherent words anymore.

Now you can just type any sentence or full text in a handheld device and get its translation (transliterated, with rules of syntax applied) instantly. From English to Spanish, Farsi, French, Russian, German, Polish, Arabic, Chinese, Italian, Polish, Portuguese languages and back to English!

Moreover, by pressing one button, you can hear the translated text pronounced aloud correctly in the targeted language (with the text to speech synthesized voice).



Voice output for Translating Dictionaries and Phrasebooks

Most translation devices today have voice output. This feature is normally redundant for communication, because the other person is reading the translation on the screen anyhow. But proves useful in many situations.

Many people assume the presence of voice output with speech input means all you have to do is talk into the device and out comes a perfect translation. Then the other person talks into the device. But this assumption is wrong.

You can use voice output for purposes such as:

  • When you want minimal distractions with the screen.
  • When you don't want to hand the other person your device.
  • When the sunlight or other conditions make the screen hard to read.
  • You want to learn pronunciation (use the headset or earbuds).

Of course, you would not want to use the voice output when, for example,

  • You want privacy.
  • It's too noisy to hear clearly.
  • You are both using the screen anyhow.
Fact #1: You cannot simply speak free-form into a translator and have an exact translation come out. But with the TLX and LUX translators, we can come awfully close to that.

Fact #2: Speech recognition is not appropriate for high noise environments, because the background noise will create problems. It works just fine in environments where the background noise doesn't require a person to speak loudly to be heard (meaning a non-earplug zone, if you're talking about a factory). If you can understand that before buying one of these devices, then you will be a very happy owner.

Yes, it sounds like we're underselling this feature. In a sense, we are. The point here is to let you know not to assume the device will always do a perfect job of speech-based translation. It has to contend with all kinds of nuances, such as diction errors, background noise, varying rates of speech and pitch, accents, and other obstacles. It does this well, but not perfectly. It's worth noting the same can be said of human interpreters!

Before these electronic devices were available, people used paper pocket dictionaries  (successfully, for decades). These typically had a couple hundred words and a few dozen phrases. The electronic ones were doing 20,000 words and 2,000 phrases about a decade ago. With today's massive dictionary sizes, you never need to fear being lost in translation.

What we now have on the market works very well. You will be happy with your purchase.