Cloud Consulting Companies

  • BIG Data & Analytics
  • CLOUD
  • Data Center
  • IOT
  • Machine Learning & AI
  • SECURITY
  • Server
  • BlockChain
  • Virtualization
You are here: Home / CLOUD / Multi-language identification and transcription in Video Indexer

Multi-language identification and transcription in Video Indexer

November 25, 2019 by cbn Leave a Comment

Multi-language speech transcription was recently introduced into Microsoft Video Indexer at the International Broadcasters Conference (IBC). It is available as a preview capability and customers can already start experiencing it in our portal. More details on all our IBC2019 enhancements can be found here.

Multi-language videos are common media assets in the globalization context, global political summits, economic forums, and sport press conferences are examples of venues where speakers use their native language to convey their own statements. Those videos pose a unique challenge for companies that need to provide automatic transcription for video archives of large volumes. Automatic transcription technologies expect users to explicitly determine the video language in advance to convert speech to text. This manual step becomes a scalability obstacle when transcribing multi-language content as one would have to manually tag audio segments with the appropriate language.

Microsoft Video Indexer provides a unique capability of automatic spoken language identification for multi-language content. This solution allows users to easily transcribe multi-language content without going through tedious manual preparation steps before triggering it. By that, it can save anyone with large archive of videos both time and money, and enable discoverability and accessibility scenarios.

Multi-language audio transcription in Video Indexer

The multi-language transcription capability is available as part of the Video Indexer portal. Currently, it supports four languages including English, French, German and Spanish, while expecting up to three different languages in an input media asset. While uploading a new media asset you can select the “Auto-detect multi-language” option as shown below.

1.	A new multi-language option available in the upload page of Video Indexer portal

Our application programming interface (API) supports this capability as well by enabling users to specify 'multi' as the language in the upload API. Once the indexing process is completed, the index JavaScript object notation (JSON) will include the underlying languages. Refer to our documentation for more details.

Additionally, each instance in the transcription section will include the language in which it was transcribed.

2.	A transcription snippet from Video Indexer timeline presenting different language segments

Customers can view the transcript and identified languages by time, jump to the specific places in the video for each language, and even see the multi-language transcription as video captions. The result transcription is also available as closed caption files (VTT, TTML, SRT, TXT, and CSV).

two languages

Methodology

Language identification from an audio signal is a complex task. Acoustic environment, speaker gender, and speaker age are among a variety of factors that affect this process. We represent audio signal using a visual representation, such as spectrograms, assuming that, different languages induce unique visual patterns which can be learned using deep neural networks.

Our solution has two main stages to determine the languages used in multi-language media content. First, it employs a deep neural network to classify audio segments with very high granularity, in other words, very few seconds. While a good model will successfully identify the underlying language, it can still miss-identify some segments due to similarities between languages. Therefore, we apply a second stage for examining these misses and smooth the results accordingly.

3.	A new insight pane showing the detected spoken languages and their exact occurrences on the timeline

Next steps

We introduced a differentiated capability for multi-language speech transcription. With this unique capability in Video Indexer, you can become more effective about the content of your videos as it allows you to immediately start searching across videos for different language segments. During the coming few months, we will be improving this capability by adding support for more languages and improving the model’s accuracy.

For more information, visit Video Indexer’s portal or the Video Indexer developer portal, and try this new capability. Read more about the new multi-language option and how to use it in our documentation.

Please use our UserVoice to share feedback and help us prioritize features or email visupport@microsoft.com with any questions.

Share on FacebookShare on TwitterShare on LinkedinShare on Pinterest

Filed Under: CLOUD

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • March 2016
  • October 2014

Recent Posts

  • Will ChatGPT help retire me as Software Engineer anytime soon?
  • Modernize your apps and accelerate business growth with AI
  • Connect, secure, and simplify your network resources with Azure Virtual Network Manager
  • Introducing GPT-4 in Azure OpenAI Service
  • Azure Data Manager for Energy: Achieve interoperability with Petrel

Recent Comments

  • Purefit Keto Reviews on Are PDUs Your Best Platform for DCIM Instrumentation?
  • https://gemcr.org/ on 10 Things You Should Know About Deep Learning

Categories

  • BIG Data & Analytics
  • BlockChain
  • CLOUD
  • Data Center
  • IOT
  • Machine Learning & AI
  • SECURITY
  • Server
  • Uncategorized
  • Virtualization

Categories

  • BIG Data & Analytics (2,187)
  • BlockChain (490)
  • CLOUD (3,182)
  • Data Center (655)
  • IOT (2,251)
  • Machine Learning & AI (87)
  • SECURITY (1,593)
  • Server (4)
  • Uncategorized (2,017)
  • Virtualization (331)

Subscribe Our Newsletter

 Subscribing I accept the privacy rules of this site

Copyright © 2023 · News Pro Theme on Genesis Framework · WordPress · Log in