Skip to content

Improve tokenizer pretty-pretty logic + __call__ method #1417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

craymichael
Copy link
Contributor

Summary: Use the call method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64998804

craymichael added a commit to craymichael/captum that referenced this pull request Oct 25, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 25, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64998804

craymichael added a commit to craymichael/captum that referenced this pull request Oct 25, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 25, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 25, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64998804

craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
Summary: Slight bug in the retry GitHub workflow causes failing workflows to rerun indefinitely. This _should_ fix it

Differential Revision: D65008676
Summary:

Add __call__ to TokenizerLike for transformers compatibility

Differential Revision: D64998805
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64998804

craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
craymichael added a commit to craymichael/captum that referenced this pull request Oct 26, 2024
Summary:

Use the __call__ method of tokenizers that returns a BatchEncoding with offsets. This allows us to grab text from the fully decoded string and not make assumptions about how many tokens correspond to a single string.

Differential Revision: D64998804
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ad89e0b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants