-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Labels
buildBuild related issuesBuild related issuesenhancementNew feature or requestNew feature or request
Description
Good day everyone!
I'm thinking about bindings for Python.
So far, I'm interested in 4 functionalities:
- Encoder processing
- Decoder processing
- Transcription of audio (feed audio bytes, get text)
- 3+Times of all words (feed audio bytes, get text + times of each word). Of course, it’s too early to think about the times of words, since even for a python implementation they are still not well done.
Perhaps in the near future, I will try to take up this task. But I had no experience with python bindings. So, if there are craftsmen who can do it quickly (if it can be done quickly... 😃), that would be cool!
ggerganov, pachacamac, LexiconCode, alecmerdler, mayeaux and 11 more
Metadata
Metadata
Assignees
Labels
buildBuild related issuesBuild related issuesenhancementNew feature or requestNew feature or request
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
ArtyomZemlyak commentedon Oct 1, 2022
Some work around:
Building
Run main
And its works!
ArtyomZemlyak commentedon Oct 1, 2022
But with specific functions it is already more difficult:
It might be worth considering running python and c++ in different threads/processes and sharing information between them, when its needed.
ggerganov commentedon Oct 1, 2022
Thank you very much for your interest in the project!
I think we first need a proper C-style wrapper of the model loading / encode and decode functionality / sampling strategies. After that we will easily create python and other language bindings. I've done similar work in my 'ggwave' project.
I agree that the encode and decode functionality should be exposed through the API as you suggested. It would give more flexibility to the users of the library/bindings.
aichr commentedon Oct 4, 2022
@ArtyomZemlyak First you reinvent the pytorch functions in c, then you want python bindings around them. Isn't the end result the same as what we have in pytorch?
ggerganov commentedon Oct 4, 2022
The initial API is now available on
master
:https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h
The first part allows more fine-grained control over the inference and also allows the user to implement their own sampling strategy using the predicted probabilities for each token.
The second part of the API includes methods for full inference - you simply provide the audio samples and choose the sampling parameters.
Most likely the API will change with time, but this is a good starting point.
ref #9 : add API documentation in whisper.h
richardburleigh commentedon Oct 9, 2022
This is as far as I got trying to get the API working in Python.
It loads the model successfully, but gets a segmentation fault on whisper_full.
Any ideas?
Edit - Got some debugging info from gdb but it didn't help much:
0x00007ffff67916c6 in log_mel_spectrogram(float const*, int, int, int, int, int, int, whisper_filters const&, whisper_mel&)
ggerganov commentedon Oct 9, 2022
Here is one way to achieve this:
# build shared libwhisper.so gcc -O3 -std=c11 -pthread -mavx -mavx2 -mfma -mf16c -fPIC -c ggml.c g++ -O3 -std=c++11 -pthread --shared -fPIC -static-libstdc++ whisper.cpp ggml.o -o libwhisper.so
Use it from Python like this:
richardburleigh commentedon Oct 9, 2022
Thank you @ggerganov - really appreciate your work!
Still getting a seg fault with your code, but I'll assume it's a me problem:
richardburleigh commentedon Oct 10, 2022
Got a segfault in the same place on an Intel 12th gen CPU and M1 Macbook with no changes to the above Python script. Anyone else tried it?
Were you using the same codebase as master @ggerganov ?
ggerganov commentedon Oct 10, 2022
Yeah, the
ctx
pointer wasn't being passed properly. I've updated the python script above. Give it another try - I think it should work now.72 remaining items
egfthomas commentedon Jan 17, 2024
Is there a streaming function in the original python/pytorch implementation ?
SeeknnDestroy commentedon Apr 26, 2024
can I use faster_whisper for real time transcription tasks?
chrisspen commentedon Apr 26, 2024
@SeeknnDestroy
Probably not. faster_whisper is a lot faster than the pure Python implementation, but a lot slower than this C++ version.
I'd only recommend faster_whisper when you want good performance but don't have a GPU needed to run whisper.cpp.
hboehmer868 commentedon May 25, 2024
After some struggle with the python bindings documented in the README and also trying whisper-cpp-python to no success, I landed on pywhispercpp. Might be worth adding to the list in README @ggerganov
BBC-Esq commentedon Jun 6, 2024
I agree. I tested it out and it works alright, but it doesn't have gpu acceleration yet. The maintainer said it's just a time commitment thing, which I can understand. Would love to get some python bindings from somewhere that also support gpu so I can do some more benchmarking.
hboehmer868 commentedon Jun 12, 2024
@BBC-Esq I have gotten pywhispercpp to run with gpu support. You can clone it from source and build it with CUDA support enabled, just like you do with whisper.cpp itself. I am gonna warn you directly that there are some issues with installing directly from source as you can read in my Issue over there.
Here is how I currently do it:
Add pywhispercpp to the Pybind11 Python wrapper list
readme : remove invalid flag from Python example (#2396)
readme : add cython bindings (ggml-org#9)
readme : remove invalid flag from Python example (ggml-org#2396)
readme : remove invalid flag from Python example (ggml-org#2396)
readme : remove invalid flag from Python example (ggml-org#2396)
fann1993814 commentedon May 12, 2025
Hi, there. I develop a lightweight wrapper to use
libwhisper
, namely whisper.cpy. Moreover, I also translate somewhisper-streaming
features to this, and it supports the async-process streaming chunk with threading.