Skip to content

Python bindings (C-style API) #9

@ArtyomZemlyak

Description

@ArtyomZemlyak

Good day everyone!
I'm thinking about bindings for Python.

So far, I'm interested in 4 functionalities:

  1. Encoder processing
  2. Decoder processing
  3. Transcription of audio (feed audio bytes, get text)
  4. 3+Times of all words (feed audio bytes, get text + times of each word). Of course, it’s too early to think about the times of words, since even for a python implementation they are still not well done.

Perhaps in the near future, I will try to take up this task. But I had no experience with python bindings. So, if there are craftsmen who can do it quickly (if it can be done quickly... 😃), that would be cool!

Activity

ArtyomZemlyak

ArtyomZemlyak commented on Oct 1, 2022

@ArtyomZemlyak
Author

Some work around:

Building

main: ggml.o main.o
	g++ -L ggml.o -c -fPIC main.cpp -o main.o
	g++ -L ggml.o -shared -Wl,-soname,main.so -o main.so main.o ggml.o
	g++ -pthread -o main ggml.o main.o
	./main -h

ggml.o: ggml.c ggml.h
	gcc -O3 -mavx -mavx2 -mfma -mf16c -c -fPIC ggml.c -o ggml.o
	gcc -shared -Wl,-soname,ggml.so -o ggml.so ggml.o

main.o: main.cpp ggml.h
	g++ -pthread -O3 -std=c++11 -c main.cpp

Run main

import ctypes
import pathlib


if __name__ == "__main__":
    # Load the shared library into ctypes
    libname = pathlib.Path().absolute() / "main.so"
    whisper = ctypes.CDLL(libname)

    whisper.main.restype = None
    whisper.main.argtypes = ctypes.c_int, ctypes.POINTER(ctypes.c_char_p)

    args = (ctypes.c_char_p * 9)(
        b"-nt",
        b"--language", b"ru",
        b"-t", b"8",
        b"-m", b"../models/ggml-model-tiny.bin",
        b"-f", b"../audio/cuker1.wav"
    )
    whisper.main(len(args), args)

And its works!

ArtyomZemlyak

ArtyomZemlyak commented on Oct 1, 2022

@ArtyomZemlyak
Author

But with specific functions it is already more difficult:

  • You need to load the model at the C++ level
  • Ability to access its encode decode methods
  • In this case, the whole process with the loaded model should go in parallel with the Python

It might be worth considering running python and c++ in different threads/processes and sharing information between them, when its needed.

ggerganov

ggerganov commented on Oct 1, 2022

@ggerganov
Member

Thank you very much for your interest in the project!

I think we first need a proper C-style wrapper of the model loading / encode and decode functionality / sampling strategies. After that we will easily create python and other language bindings. I've done similar work in my 'ggwave' project.

I agree that the encode and decode functionality should be exposed through the API as you suggested. It would give more flexibility to the users of the library/bindings.

aichr

aichr commented on Oct 4, 2022

@aichr

@ArtyomZemlyak First you reinvent the pytorch functions in c, then you want python bindings around them. Isn't the end result the same as what we have in pytorch?

ggerganov

ggerganov commented on Oct 4, 2022

@ggerganov
Member

The initial API is now available on master:

https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h

The first part allows more fine-grained control over the inference and also allows the user to implement their own sampling strategy using the predicted probabilities for each token.

The second part of the API includes methods for full inference - you simply provide the audio samples and choose the sampling parameters.

Most likely the API will change with time, but this is a good starting point.

added a commit that references this issue on Oct 8, 2022
richardburleigh

richardburleigh commented on Oct 9, 2022

@richardburleigh

This is as far as I got trying to get the API working in Python.

It loads the model successfully, but gets a segmentation fault on whisper_full.

Any ideas?

import ctypes
import pathlib

if __name__ == "__main__":
    libname = pathlib.Path().absolute() / "whisper.so"
    whisper = ctypes.CDLL(libname)
    modelpath = b"models/ggml-medium.bin"
    model = whisper.whisper_init(modelpath)
    params = whisper.whisper_full_default_params(b"WHISPER_DECODE_GREEDY")
    w = open('samples/jfk.wav', "rb").read()
    result = whisper.whisper_full(model, params, w, b"16000")
    # Segmentation fault
    

Edit - Got some debugging info from gdb but it didn't help much:
0x00007ffff67916c6 in log_mel_spectrogram(float const*, int, int, int, int, int, int, whisper_filters const&, whisper_mel&)

ggerganov

ggerganov commented on Oct 9, 2022

@ggerganov
Member

Here is one way to achieve this:

# build shared libwhisper.so
gcc -O3 -std=c11   -pthread -mavx -mavx2 -mfma -mf16c -fPIC -c ggml.c
g++ -O3 -std=c++11 -pthread --shared -fPIC -static-libstdc++ whisper.cpp ggml.o -o libwhisper.so

Use it from Python like this:

import ctypes
import pathlib

# this is needed to read the WAV file properly
from scipy.io import wavfile

libname     = "libwhisper.so"
fname_model = "models/ggml-tiny.en.bin"
fname_wav   = "samples/jfk.wav"

# this needs to match the C struct in whisper.h
class WhisperFullParams(ctypes.Structure):
    _fields_ = [
        ("strategy",             ctypes.c_int),
        ("n_threads",            ctypes.c_int),
        ("offset_ms",            ctypes.c_int),
        ("translate",            ctypes.c_bool),
        ("no_context",           ctypes.c_bool),
        ("print_special_tokens", ctypes.c_bool),
        ("print_progress",       ctypes.c_bool),
        ("print_realtime",       ctypes.c_bool),
        ("print_timestamps",     ctypes.c_bool),
        ("language",             ctypes.c_char_p),
        ("greedy",               ctypes.c_int * 1),
    ]

if __name__ == "__main__":
    # load library and model
    libname = pathlib.Path().absolute() / libname
    whisper = ctypes.CDLL(libname)

    # tell Python what are the return types of the functions
    whisper.whisper_init.restype                  = ctypes.c_void_p
    whisper.whisper_full_default_params.restype   = WhisperFullParams
    whisper.whisper_full_get_segment_text.restype = ctypes.c_char_p

    # initialize whisper.cpp context
    ctx = whisper.whisper_init(fname_model.encode("utf-8"))

    # get default whisper parameters and adjust as needed
    params = whisper.whisper_full_default_params(0)
    params.print_realtime = True
    params.print_progress = False

    # load WAV file
    samplerate, data = wavfile.read(fname_wav)

    # convert to 32-bit float
    data = data.astype('float32')/32768.0

    # run the inference
    result = whisper.whisper_full(ctypes.c_void_p(ctx), params, data.ctypes.data_as(ctypes.POINTER(ctypes.c_float)), len(data))
    if result != 0:
        print("Error: {}".format(result))
        exit(1)

    # print results from Python
    print("\nResults from Python:\n")
    n_segments = whisper.whisper_full_n_segments(ctypes.c_void_p(ctx))
    for i in range(n_segments):
        t0  = whisper.whisper_full_get_segment_t0(ctypes.c_void_p(ctx), i)
        t1  = whisper.whisper_full_get_segment_t1(ctypes.c_void_p(ctx), i)
        txt = whisper.whisper_full_get_segment_text(ctypes.c_void_p(ctx), i)

        print(f"{t0/1000.0:.3f} - {t1/1000.0:.3f} : {txt.decode('utf-8')}")

    # free the memory
    whisper.whisper_free(ctypes.c_void_p(ctx))
pinned this issue on Oct 9, 2022
richardburleigh

richardburleigh commented on Oct 9, 2022

@richardburleigh

Thank you @ggerganov - really appreciate your work!

Still getting a seg fault with your code, but I'll assume it's a me problem:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
log_mel_spectrogram (samples=<optimized out>, n_samples=<optimized out>, sample_rate=<optimized out>, fft_size=<optimized out>, fft_step=<optimized out>, n_mel=80, n_threads=<optimized out>, filters=..., mel=...) at whisper.cpp:1977
1977	    mel.data.resize(mel.n_mel*mel.n_len);
(gdb) bt
#0  log_mel_spectrogram (samples=<optimized out>, n_samples=<optimized out>, sample_rate=<optimized out>, fft_size=<optimized out>, fft_step=<optimized out>, n_mel=80, n_threads=<optimized out>, filters=..., mel=...) at whisper.cpp:1977
#1  0x00007fffc28d24c7 in whisper_pcm_to_mel (ctx=0x560d7680, samples=0x7fffb3345010, n_samples=176000, n_threads=4) at whisper.cpp:2101
#2  0x00007fffc28d4113 in whisper_full (ctx=0x560d7680, params=..., samples=<optimized out>, n_samples=<optimized out>) at whisper.cpp:2316
richardburleigh

richardburleigh commented on Oct 10, 2022

@richardburleigh

Got a segfault in the same place on an Intel 12th gen CPU and M1 Macbook with no changes to the above Python script. Anyone else tried it?

Were you using the same codebase as master @ggerganov ?

ggerganov

ggerganov commented on Oct 10, 2022

@ggerganov
Member

Yeah, the ctx pointer wasn't being passed properly. I've updated the python script above. Give it another try - I think it should work now.

72 remaining items

egfthomas

egfthomas commented on Jan 17, 2024

@egfthomas

@ArtyomZemlyak First you reinvent the pytorch functions in c, then you want python bindings around them. Isn't the end result the same as what we have in pytorch?

Is there a streaming function in the original python/pytorch implementation ?

SeeknnDestroy

SeeknnDestroy commented on Apr 26, 2024

@SeeknnDestroy

@dnhkng Yes. The problem seems to be the C++ code. Might work fine with a gpu, but on cpu, it runs slower than pure Python on a 10 year old machine. And if C++ needs an expensive gpu to be faster than Python on a cpu, it's not good code.

I'm finding faster_whisper is much more usable for Python and far more cost effective.

can I use faster_whisper for real time transcription tasks?

chrisspen

chrisspen commented on Apr 26, 2024

@chrisspen

@SeeknnDestroy

can I use faster_whisper for real time transcription tasks?

Probably not. faster_whisper is a lot faster than the pure Python implementation, but a lot slower than this C++ version.

I'd only recommend faster_whisper when you want good performance but don't have a GPU needed to run whisper.cpp.

hboehmer868

hboehmer868 commented on May 25, 2024

@hboehmer868

After some struggle with the python bindings documented in the README and also trying whisper-cpp-python to no success, I landed on pywhispercpp. Might be worth adding to the list in README @ggerganov

BBC-Esq

BBC-Esq commented on Jun 6, 2024

@BBC-Esq

After some struggle with the python bindings documented in the README and also trying whisper-cpp-python to no success, I landed on pywhispercpp. Might be worth adding to the list in README @ggerganov

I agree. I tested it out and it works alright, but it doesn't have gpu acceleration yet. The maintainer said it's just a time commitment thing, which I can understand. Would love to get some python bindings from somewhere that also support gpu so I can do some more benchmarking.

hboehmer868

hboehmer868 commented on Jun 12, 2024

@hboehmer868

@BBC-Esq I have gotten pywhispercpp to run with gpu support. You can clone it from source and build it with CUDA support enabled, just like you do with whisper.cpp itself. I am gonna warn you directly that there are some issues with installing directly from source as you can read in my Issue over there.

Here is how I currently do it:

# Clone from source
git clone --branch v1.2.0 --recurse-submodules https://github.com/abdeladim-s/pywhispercpp.git
# Build a python wheel with CUDA support
# FYI: The submodule whisper.cpp in pywhispercpp is currently pinned at version 1.5.4,
# which still uses the old cmake flag for CUDA support, later versions use -DWHISPER_CUDA=1
cd pywhispercpp
CMAKE_ARGS="-DWHISPER_CUBLAS=1" python3 -m build --wheel
# Install the wheel into your python environment
python3 -m pip install dist/pywhispercpp-*.whl
fann1993814

fann1993814 commented on May 12, 2025

@fann1993814

Hi, there. I develop a lightweight wrapper to use libwhisper, namely whisper.cpy. Moreover, I also translate some whisper-streaming features to this, and it supports the async-process streaming chunk with threading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    buildBuild related issuesenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mrmachine@chrisspen@pachacamac@ggerganov@stlukey

        Issue actions

          Python bindings (C-style API) · Issue #9 · ggml-org/whisper.cpp