
Whisper is taking the speech-to-text ecosystem by storm: it can automatically detect the input language, then transcribe text in around 100 languages, automatically punctuate the result, and even translate the result if needed. Their 2 most exciting models: GPT-3 and DALL-E, are still private models that can only be used through their paid API. Not all OpenAI's models have been open-sourced though. Recently, they also released a nice CUDA programming framework called Triton. At the time it was the best generative natural language processing model ever created, and it paved the way for much more advanced models like GPT-3, GPT-J, OPT, Bloom. For example GPT-2 was developed by OpenAI a couple of years ago. OpenAI has a history of open-sourcing great AI projects. Whisper is an open-source AI model that has just been released by OpenAI.

Whisper: The Best Alternative To Google Speech-To-Text If you are concerned about costs or privacy, you might want to switch to an open-source alternative: OpenAI Whisper. If you have 5 support agents spending 4h each per day on the phone with customers, Google's speech-to-text API will cost you $1,400 per month. Let's say you want to automatically analyze phone calls made to your support team (in order to later perform sentiment analysis or entity extraction on them for example). Google's pricing is basically $0.006 / 15 seconds for basic speech-to-text, and $0.009 / 15 seconds for specific use cases like video transcription or phone transcription.

But it's important to note that the on-prem AI model will keep sending data to Google in order to report API usage, which might be a concern from a privacy standpoint. Last of all, their API can be installed on premises. This API also has nice additional features like content filtering, automatic punctuation (in beta only for the moment), and speaker diarization (in beta too). This API is able to transcribe audio and video files in 125 languages, and it proposes specific AI models for phone calls transcription, medical transcription, and more. Google's automatic speech recognition (speech-to-text) API is very popular.
