OpenAI’s newly released “Whisper” speech recognition model has been said to provide accurate transcriptions in multiple languages and even translate them to English. As Deepgram CEO, Scott Stephenson, recently tweeted “OpenAI + Deepgram is all good — rising tide lifts all boats.” We’re stoked to see others are buying into what we’ve been preaching for nearly a decade: end-to-end deep learning is the answer to speech-to-text.
As our team played with Whisper last week, we wanted to make sure as many people as possible could try it with minimal effort. And since we already offer some of the most accurate and performant speech recognition models in the world, why not add another? 😁
permalinkAnnouncing Whisper Multilingual AI Speech Recognition on Deepgram
Last week, we released the Whisper speech recognition model via the Deepgram API. All accounts now have access to the whisper model for free. But, we wanted to make it even easier to try. So, we made it available for people without a Deepgram account. That’s right! You can send files to the API without needing an API key. Try out the shell commands below to see how Whisper performs on your local files or those hosted elsewhere.
permalinkUse cURL to Transcribe Local Files with Whisper
You can start testing the Whisper model now by running the snippet below in your terminal.
permalinkUse cURL to Transcribe Remote Files with Whisper
Don’t have an audio file to test? You can also send the URL to a hosted file by changing your request to the code snippet below. You can replace the https://static.deepgram.com/examples/epi.wav
URL with a file that you’d like to test against.
We even provide several demo files that you can use:
- https://static.deepgram.com/examples/dragons.wav
- https://static.deepgram.com/examples/epi.wav
- https://static.deepgram.com/examples/interview\_speech-analytics.wav
- https://static.deepgram.com/examples/koreanSampleFile.mp3
- https://static.deepgram.com/examples/sofiavergaraspanish.clip.wav
- https://static.deepgram.com/examples/timotheefrench.clip.wav
permalinkTry Whisper in Your Browser
You can also test the whisper model in your browser when you signup for a free Deepgram account. Our getting started missions allow you to compare the whisper model to Deepgram models using your own files and/or sample files that we provide.
permalinkThe Final Result
Below is the result of a NASA phone call transcribed with the whisper model.
There are a few empty data points that stand out, namely confidence
, start
, and end
. The whisper model doesn’t provide that level of detail, so we were forced to provide zero values for them. Comparatively, below is the response using Deepgrams enhanced English model.
permalinkIs Whisper Right for Me?
Are you an AI researcher? Sure! Where else can you get your hands on an implemented end-to-end deep-learning modern architecture to play with? As long as you don’t need real-time transcription, whisper can be used for prototyping and experimenting. However, if you need real-time transcription, speed, and/or scalability, whisper is not ready for use today.
permalinkTesting the OpenAI Whisper Models
Have you tried using any of the Whisper models since their release? Tell the community about your experience in our GitHub Discussions.