-10.7 C
New York
Wednesday, January 22, 2025

OpenAI’s Whisper API for Transcription and Translation


https://www.kdnuggets.com/wp-content/uploads/anello_openai_whisper_api_transcription_translation_1.gif
Illustration by Creator | Supply: flaticon

 

Did you accumulate numerous recordings, however you don’t have any power to begin to pay attention and transcribe them? Once I was nonetheless a scholar, I do not forget that I needed to wrestle each day with listening hours and hours of recorded classes and most of my time was taken away from transcription. Moreover, it wasn’t my native language and I needed to drag each sentence into google translate to transform it into Italian.

Now, handbook transcription and translation are solely a reminiscence. The well-known analysis firm for ChatGPT, OpenAI, launched Whisper API for speech-to-text dialog! With a number of traces of Python code, you possibly can name this highly effective speech recognition mannequin,  get the thought off of your thoughts and deal with different actions, like making apply with knowledge science tasks and enhancing your portfolio. Let’s get began!

 

 

Whisper is a mannequin based mostly on neural networks developed by OpenAI to resolve speech-to-text duties. It belongs to the GPT-3 household and has develop into highly regarded for its means to transcribe audio into textual content with very excessive accuracy.

It doesn’t restrict dealing with English, however its means is prolonged to greater than 50 languages. In case you are to know in case your language is included, test right here. Moreover, it could translate any language audio into English.

Like different OpenAI merchandise, there may be an API to get entry to those speech recognition providers, permitting builders and knowledge scientists to combine Whisper into their platforms and apps. 

 

 

OpenAI’s Whisper API for Transcription and Translation
GIF by Creator

 

Earlier than going additional, you want a number of steps to get entry to Whisper API. First, go and log in to the OpenAI API web site. If you happen to nonetheless don’t have the account, you must create it. After you entered, click on in your username and press the choice “View API keys”. Then, click on the button “Create new API key” and replica the brand new create API key in your Python code.

 

 

First, let’s obtain a youtube video of Kevin Stratvert, a highly regarded YouTuber that helps college students from everywhere in the world to grasp expertise and enhance abilities by studying instruments, like Energy BI, video enhancing and AI merchandise. For instance, let’s suppose that we want to transcribe the video “3 Thoughts-blowing AI Instruments”.

We will immediately obtain this video utilizing pytube library. To put in it, you want the next command line:

pip set up pytube3
pip set up openai

 

We additionally set up the openai library, since will probably be used later within the tutorial. As soon as there are all of the python libraries put in, we simply have to cross the URL of the video to the Youtube object. After, we get the best decision video stream and, then, obtain the video.

from pytube import YouTube

video_url = "https://www.youtube.com/watch?v=v6OB80Vt1Dk&t=1s&ab_channel=KevinStratvert"

yt = YouTube(video_url)
stream = yt.streams.get_highest_resolution()
stream.obtain()

 

As soon as the file is downloaded, it’s time to begin the enjoyable half! 

import openai

API_KEY = 'your_api_key'
model_id = 'whisper-1'
language = "en"
audio_file_path="audio/5_tools_audio.mp4"
audio_file = open(audio_file_path, 'rb')

 

After organising the parameters and opening the audio file, we will transcribe the audio and reserve it right into a Txt file.

response = openai.Audio.transcribe(
    api_key=API_KEY,
    mannequin=model_id,
    file=audio_file,
    language="en"
)
transcription_text = response.textual content
print(transcription_text)

 

Output:

Hello everybody, Kevin right here. At the moment, we will take a look at 5 totally different instruments that leverage synthetic intelligence in some really unbelievable methods. Right here as an illustration, I can change my voice in actual time. I may also spotlight an space of a photograph and I could make that simply routinely disappear. Uh, the place'd my son go? I may also give the pc directions, like, I do not know, write a music for the Kevin cookie firm....

 

Because it was anticipated, the output may be very correct. Even the punctuation is so exact, I’m very impressed! 

 

 

This time, we’ll translate the audio from Italian to the English language. As earlier than, we obtain the audio file. In my instance, I’m utilizing this youtube video of a well-liked Italian YouTuber Piero Savastano that teaches machine studying in a quite simple and humorous means. You simply want to repeat the earlier code and alter solely the URL. As soon as it’s downloaded, we open the audio file as earlier than:

audio_file_path="audio/ml_in_python.mp4"
audio_file = open(audio_file_path, 'rb')

 

Then, we will generate the English translation ranging from the Italian language. 

response = openai.Audio.translate(
    api_key=API_KEY,
    mannequin=model_id,
    file=audio_file
)
translation_text = response.textual content
print(translation_text)

 

Output:

We additionally see some graphs in a statistical fashion, so we also needs to perceive methods to learn them. One is the field plot, which permits to see the distribution when it comes to median, first quarter and third quarter. Now I'll inform you what it means. We at all times take the info from the info body. X is the season. On Y we put the depend of the bikes which might be rented. After which I wish to distinguish these field plots based mostly on whether or not it's a vacation day or not. This graph comes out. How do you learn this? Right here on the X there may be the season, coded in numerical phrases. In blue we have now the non-holiday days, in orange the vacations. And right here is the depend of the bikes. What are these rectangles? Take this field right here. I am turning it round with the mouse....

 

 

That’s it! I hope that this tutorial has helped you on getting began with Whisper API. On this case examine, it was utilized with youtube movies, however you may also strive podcasts, zoom calls and conferences. I discovered the outputs obtained after the transcription and the interpretation very spectacular! This AI device is unquestionably serving to lots of people proper now. The one restrict is the truth that it’s solely potential to translate to English textual content and never vice versa, however I’m positive that OpenAI will present it quickly. Thanks for studying! Have a pleasant day!

 

 

 
 
Eugenia Anello is at present a analysis fellow on the Division of Data Engineering of the College of Padova, Italy. Her analysis venture is concentrated on Continuous Studying mixed with Anomaly Detection.
 

Related Articles

Latest Articles