Complete Introductory Guide to Speech to Text with Transformers

Learn how to get started with audio Machine Learning.

Oct 09, 2023

Introduction

We all deal with audio data much more than we realize. The world is full of audio data and related problems that beg solving. And we can use Machine Learning to solve many of these problems. You are probably familiar with image, text, and tabular data being used to train Machine Learning models- and Machine Learning being used to solve problems in these domains. With the advent of Transformer architectures, it has been possible to solve audio-related problems with much better accuracy than previously known methods. We will learn the basics of Audio ML using speech-to-text with transformers and learn to use the Huggingface library to solve audio-related problems with Machine Learning.

Learning Objectives

Learn about the basics of audio Machine Learning and gain related background knowledge.
Learn how audio data is collected, stored, and processed for Machine Learning.
Learn about a common and valuable task: speech-to-text using Machine Learning.
Learn how to use Huggingface tools and libraries for your audio tasks- from finding datasets to trained models, and use them to solve audio problems with Machine Learning leveraging the Huggingface Python library.

This article was published as a part of the Data Science Blogathon.

This post was published in Analytics Vidhya magazine. They don’t allow republishing articles. Please visit the blog page on the site. They pay each writer. If I hit INR 5k/month from this subscription, I will write here exclusively.

ritoLAB

Discussion about this post