Replies: 1 comment 1 reply
-
I do see ending effects with OpenAI whisper which uses PyTorch as a back-end. E.g. running: import whisper
model = whisper.load_model("turbo")
# load audio and pad/trim it to fit 30 seconds
data = whisper.load_audio("conal.mp3")
data = data[-1_000_000:]
# make log-Mel spectrogram and move to the same device as the model
result = model.transcribe(data)
print(result["text"]) gives the following where the last few sentences are just gibberish:
One thing that helps is setting result = mlx_whisper.transcribe(
"conal.mp3",
path_or_hf_repo="mlx-community/whisper-large-v3-turbo",
condition_on_previous_text=False,
) the ending is much better:
This is anecdotally in line with what I've heard before which is that the conditioning can be more accurate sometimes but also cause repetitions in the output. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
My transcription task of this audio file with this mlx conversion of Whisper-v3-turbo via mlx-whisper concluded inaccurately with the word "Yeah" repeated 27x.
I wondered where this token repetition came from and @awni suggested I start a discussion to "verify the result by running the same audio through the transformers PyTorch implementation as a test"
Here is the code I used:
Here is the tail end of the transcription for the audio that concludes with "oh yeah"
Beta Was this translation helpful? Give feedback.
All reactions