HOW TO TRANSCRIBE AUDIO SPEECH INTO TEXT USING PYTHON
Recently i was asked for a simple & free way of transcribing audio speech into plain text.
Problem was solved via Python library called [Vosk]
.
It is a speech recognition toolkit working in offline environment.
To make things happen do the following:
pip install vosk
- [Download speech model] of required language, larger libraries bring just marginal improvement over the smaller ones despite the fact that they are over 40 times bigger in a size.
- PyAdio should be insalled also
pip install PyAudio
Note that old versions of vosk work only with 8kHz audio files, but now it is fixed and you can use generic 44Khz.
EXAMPLE CODE TO RECOGNIZE FROM WAV FILE
from vosk import Model, KaldiRecognizer
import sys
import json
import os
import time
import wave
import vosk
model = Model(r"C:\vosk")
wf = wave.open(r'C:\vosk\demo.wav', "rb")
rec = KaldiRecognizer(model, 44100)
result = ''
last_n = False
while True:
data = wf.readframes(44100)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
res = json.loads(rec.Result())
if res['text'] != '':
result += f" {res['text']}"
last_n = False
elif not last_n:
result += '\n'
last_n = True
res = json.loads(rec.FinalResult())
result += f" {res['text']}"
print(result)
Resulting text should be printed into Python console.
CODE TO RECOGNIZE FROM AUDIO LINE IN
from vosk import Model, KaldiRecognizer
import sys
import json
import os
import time
import wave
model = Model(r"C:\vosk")
wf = wave.open(r'test.wav', "rb")
rec = KaldiRecognizer(model, 44100)
result = ''
last_n = False
while True:
data = wf.readframes(44100)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
res = json.loads(rec.Result())
if res['text'] != '':
result += f" {res['text']}"
last_n = False
elif not last_n:
result += '\n'
last_n = True
res = json.loads(rec.FinalResult())
result += f" {res['text']}"
print(result)