Nvidia releases massive AI-ready European language dataset and tools

788

•

Nvidia releases massive AI-ready European language dataset and tools (siliconangle.com)

From the post:

>Only a tiny fraction of the more than 7,000 languages on Earth are supported by artificial intelligence models, so today Nvidia Corp. announced a massive new AI-ready dataset and models to support the development of high-quality AI translation for European languages. The new dataset, named Granary, is a massive open-source corpus of multilingual audio, including more than a million hours of audio, plus 650,000 hours of speech recognition and 350,000 hours of speech translation. Nvidia’s speech AI team collaborated with researchers from Carnegie Mellon University and Fondazione Bruno Kessler to process unlabeled audio and public speech data into information usable for AI training. The dataset is available openly and for free on GitHub.

Archive: https://archive.today/NS8vv From the post: >>Only a tiny fraction of the more than 7,000 languages on Earth are supported by artificial intelligence models, so today Nvidia Corp. announced a massive new AI-ready dataset and models to support the development of high-quality AI translation for European languages. The new dataset, named Granary, is a massive open-source corpus of multilingual audio, including more than a million hours of audio, plus 650,000 hours of speech recognition and 350,000 hours of speech translation. Nvidia’s speech AI team collaborated with researchers from Carnegie Mellon University and Fondazione Bruno Kessler to process unlabeled audio and public speech data into information usable for AI training. The dataset is available openly and for free on GitHub.

(post is archived)