Use async/multi-threaded requests #258
Replies: 3 comments 1 reply
-
Hi @krrishdholakia! Let me convert this into a discussion. |
Beta Was this translation helpful? Give feedback.
-
Thanks for offering to submit a PR! We've considering threading here. The issue is that some LLM providers rate-limit or block requests sent with the same token if too many occur in parallel. I know for sure that OpenAI does this, I suspect that Anthropic and Cohere aren't much different in this regard. We are aware that executing prompts one after the other is a very unsatisfactory solution. OpenAI supports batching in their deprecated |
Beta Was this translation helpful? Give feedback.
-
I'm raising LLM myself. And I wouldn't want to run requests sequentially. Now it seems the only way out is to create many parallel tracks with different texts. But I'm running into an OOM error. Can you please tell me what would be the easiest way to make the queries asynchronous? So that I don't end up rewriting the whole spacy-llm to async. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey @rmitsch / @kabirkhan
have y'all considered using async/ threading here to make this call faster?
https://github.com/explosion/spacy-llm/blob/f03da9094ee49626ae3aaccd3129e7c3237454ee/spacy_llm/models/rest/anthropic/model.py#L96C1-L99C10
Happy to make a PR to help out here. Working on a library to simplify LLM API calling - noticed y'all call the REST endpoints which is awesome!
Beta Was this translation helpful? Give feedback.
All reactions