Skip to content

Add buffer in the maximum number of tokens generated (to fix #353) #354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 20, 2023

Conversation

viswavi
Copy link
Collaborator

@viswavi viswavi commented Sep 20, 2023

Description

Issue #353 found that the LiteLLM agent is now reporting that our model's request is exceeding the maximum allowable number of tokens.

This problem stems from the fact that there's a disparity between tiktoken's tokenizer counts and:

  1. the number of tokens that OpenAI's API perceives
  2. the number of tokens that OpenAI's tokenizer playground perceives

For the full prompt parser prompt, tiktoken says there's 2569 tokens, so we set max_tokens for LiteLLM to be 4097 - 2569 = 1528
However, OpenAI's API perceives there to be 2576 tokens, which exceeds the 4097 limit, while OpenAI's tokenizer thinks there's 2862 tokens.

A naive solution here is to give a buffer; e.g. generate 300 fewer tokens than the maximum limit (so we would set max_tokens to be 1228 instead of 1528). This PR implements that solution.

References

N/A

Blocked by

N/A

@viswavi viswavi requested review from neubig and saum7800 September 20, 2023 00:49
Copy link
Collaborator

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but maybe we want to reduce the token buffer to a smaller value? 300 is a pretty significant number of tokens and may reduce our flexibility somewhat. Could we get away with 100?

Also, just a github tip, you can write
Fixes #353

in the PR description (e.g. in the references section) and the issue will be auto-closed when the PR is merged.

Copy link
Collaborator

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually, apologies. I see 300 is actually necessary for the prompt parser (per your message). I approve this PR.

@neubig neubig merged commit b01a7f8 into main Sep 20, 2023
@neubig neubig deleted the vijay_add_buffer_in_number_of_tokens_for_litellm branch September 20, 2023 12:20
@krrishdholakia
Copy link
Contributor

@viswavi @neubig would it have been helpful here to have the token counter expose a buffer param, so you could've just added a manual buffer there?

ideally tiktoken would 1:1 map the api

@neubig
Copy link
Collaborator

neubig commented Oct 11, 2023

Hmm, it sounds interesting but I'm not exactly sure what you mean by this?

@krrishdholakia
Copy link
Contributor

krrishdholakia commented Oct 11, 2023

Nvm - looks like i had an incorrect understanding of the problem. I thought you were using our token counting helper function

but it looks like it was being read from the response object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants