MeetDot: Videoconferencing with Live Translation Captions

The latest pandemic designed videoconferencing an indispensable part of our operating life.

In buy to aid people today, who speak diverse languages, proficiently talk, a the latest paper on proposes a videoconferencing remedy with dwell translation captions.

Picture credit history: Mbrickn through Wikimedia (CC BY 4.)

There, members can see an overlaid translation of other participants’ speech in their most well-liked language. The incoming speech signal is processed in a streaming method, transcribed in the speaker’s language, and utilized as enter to a machine translation system. The scientists use several characteristics to help a greater person working experience as smooth pixel-wise scrolling of the captions or fading text that is most likely to adjust.

A detailed evaluation suite is applied to accurately compute metrics like latency, caption flicker, and precision and motivate quick progress in accordance to these metrics.

We present MeetDot, a videoconferencing system with dwell translation captions overlaid on monitor. The system aims to facilitate dialogue concerning people today who speak diverse languages, thus reducing conversation boundaries concerning multilingual members. Now, our system supports speech and captions in 4 languages and brings together automatic speech recognition (ASR) and machine translation (MT) in a cascade. We use the re-translation technique to translate the streamed speech, resulting in caption flicker. Additionally, our system has pretty strict latency needs to have acceptable call good quality. We implement several characteristics to increase person working experience and reduce their cognitive load, these as smooth scrolling captions and reducing caption flicker. The modular architecture permits us to integrate diverse ASR and MT companies in our backend. Our system supplies an built-in evaluation suite to improve crucial intrinsic evaluation metrics these as precision, latency and erasure. At last, we present an modern cross-lingual word-guessing video game as an extrinsic evaluation metric to evaluate conclusion-to-conclusion system efficiency. We plan to make our system open-supply for exploration functions.

