growth currently with giant language fashions (LLMs). Numerous the main target is on the question-answering you are able to do with each pure text-based fashions, or vision-language fashions (VLMs), the place you may as well enter pictures.
Nonetheless, there may be one other dimension that has developed a ton over the previous couple of years: Audio. Fashions that may each transcribe (speech -> textual content), speech synthesis (textual content -> speech), and likewise speech-to-speech, the place you may have a complete dialog with a language mannequin, with audio going each out and in.

On this article, I’ll focus on how I’m using the event throughout the audio mannequin area to my benefit, turning into an much more environment friendly programmer.

Motivation
My major motivation for writing this text is that I’m frequently in search of methods to change into a extra environment friendly programmer. After utilizing the ChatGPT cell app for some time, I found their transcription possibility (the microphone icon to the correct within the consumer enter subject). I used the transcription and shortly realized how significantly better this transcription is in comparison with others I’ve used earlier than, comparable to Apple’s built-in iPhone transcription.
OpenAI’s transcription nearly all the time captures all of my phrases, with only a few errors. Even when I exploit much less frequent phrases, for instance, acronyms associated to pc science, it’s nonetheless in a position to decide up what I’m saying.

This transcription was solely out there within the ChatGPT app. Nonetheless, I do know that OpenAI has an API endpoint for his or her Whisper mannequin, which is (presumably) the identical mannequin they’re utilizing to transcribe textual content within the app. I thus needed to set this mannequin up on my Mac to be out there through a shortcut.
(I do know there are apps comparable to Macwhisper out there, however I needed to develop a totally free answer, aside from the prices of the API calls themselves)
Stipulations
- Alfred (I can be utilizing Alfred on the Mac to set off some scripts. Nonetheless, options to this additionally exist. Generally, you want a solution to set off scripts in your Mac / PC from a hotkey.
Professionals
The principle benefit of utilizing this transcription is which you could enter phrases into your pc extra shortly. Once I sort as shortly as I can on my pc, I’m not even in a position to attain 100 phrases per minute, and if I’m to sort at that pace, I actually must focus. Nonetheless, the typical speaking pace is at a minimum of 110, according to this article.
This implies you could be a lot more practical if you’ll be able to converse your phrases with transcription, as a substitute of typing them out on the keyboard.
I feel that is particularly related after the rise of enormous language fashions comparable to ChatGPT. You spend extra time prompting the language fashions, for instance, asking inquiries to ChatGPT, or prompting the cursor to implement a function, or fixing a bug. Thus, the usage of the English language is way more prevalent now than earlier than, in comparison with the usage of programming languages comparable to Python straight.
Notice: After all, you’ll nonetheless be writing plenty of code, however from expertise, I spend much more time prompting the cursor, for instance, with in depth English prompts, by which case, utilizing this transcription saves me plenty of time.
Cons
There can, nevertheless, be some downsides to utilizing the transcription as nicely. One of many important ones is that plenty of instances, you do not need to talk out loud when programming. You could be sitting within the airport (as I’m when writing this text), and even in your workplace. Once you’re in these situations, you most likely don’t need to disturb these round you by talking out loud. Nonetheless, if you’re sitting in a house workplace, that is naturally not an issue.
One other detrimental aspect is that smaller prompts may not be that a lot sooner. Think about this: in the event you simply need to write a immediate of a single sentence, it should, in lots of situations, be sooner simply to sort the immediate out by hand. That is due to the delay in beginning, stopping, and transcribing audio into textual content. Sending the API name takes a little bit little bit of time, and the shorter the immediate you may have, the bigger fraction of the time it’s important to spend ready for the response.
The best way to implement
You may see the code I used in this article on my GitHub. Nonetheless, you additionally want so as to add hotkeys to run the scripts.
First, it’s important to:
- Clone the GitHub repository:
git clone https://github.com/EivindKjosbakken/whisper-shortcut.git
- Create a digital setting known as .venv and set up the required packages:
python3 -m venv .venv
supply .venv/bin/activate
pip set up -r necessities.txt
- Get an OpenAI API Key. You are able to do that by:
- Going to the OpenAI API Overview, logging in/making a profile
- Go to your profile, and API Keys
- Create a brand new key. Keep in mind to repeat the important thing, as you won’t be able to see it once more
The scripts from the GitHub repository work by:
- start_recording.sh — begins recording your voice. The primary time you employ this, it should ask you for permission to make use of the microphone
- stop_recording.sh — sends a cease sign to the script to cease recording. Then sends the recorded audio to OpenAI for transcription. Moreover, it provides the transcribed textual content to your clipboard and pastes the textual content when you have a textual content subject in your PC chosen
The whole repository is offered with an MIT license.
Alfred
You will discover the Alfred workflow on the GitHub repository right here: Transcribe.alfredworkflow.
That is how I arrange the Alfred workflow:

You may merely obtain it and add it to your Alfred.
Additionally, bear in mind to have a terminal window open everytime you need to run this script, as you activate the Python script from the terminal. I needed to do it this manner as a result of if the script was activated straight from Alfred, I obtained permission points. The primary time you run the script, you ought to be prompted to offer your terminal entry to the microphone, which you need to approve.
Price
An essential consideration when utilizing APIs comparable to OpenAI Whisper is the price of the API utilization. I might take into account the price of utilizing OpenAI’s Whisper mannequin reasonably excessive. As all the time, the associated fee is totally depending on how a lot you employ the mannequin. I might say I exploit the mannequin as much as 25 instances a day, as much as 150 phrases, and the associated fee is lower than 1 greenback per day.
This implies, nevertheless, that in the event you use the mannequin lots, you’ll be able to see prices as much as 30 {dollars} per 30 days, which is unquestionably a considerable price. Nonetheless, I feel it’s essential to pay attention to the time financial savings you may have from the mannequin. If every mannequin utilization saves you 30 seconds, and you employ it 20 instances per day, you may have simply saved ten minutes of your day. Personally, I’m keen to pay one greenback to save lots of ten minutes of my day, performing a process (writing on my keyboard), that doesn’t actually grant me some other profit. If any, utilizing your keyboard might contribute to a better danger of accidents comparable to carpal tunnel syndrome. Utilizing the mannequin is thus positively value it for me.
Conclusion
On this article, I began off discussing the immense advances inside language fashions in the previous couple of years. This has helped us create highly effective chatbots, saving us huge quantities of time. Nonetheless, with the advances of language fashions, we’ve additionally seen advances in voice fashions. Transcription utilizing OpenAI Whisper is now close to excellent (from private expertise), which makes it a robust device you should utilize to enter phrases in your pc extra successfully. I mentioned the professionals and cons of utilizing OpenAI Whisper in your PC, and I additionally went step-by-step by how one can implement it by yourself pc.