Skip to content

Prevent negative scores #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

houston-d
Copy link

@houston-d houston-d commented Jun 17, 2025

Closes #17

When n >= 4, it is possible to get a negative keyword score returned. This is occurs when there are two or more stopwords in a row. This causes sum_h < -1 and so the denominator remains negative when calculating self.h for the composed word.

This PR remedies this by treating a chain of stopwords as a single entity for calculating h. When a stopword is located, we calculate prob_t1 as before. Now, prob_t2 is calculated using the final stopword in the chain. The loop then skips to the word following the chain.

As all existing tests use n <= 3, their result remains unchanged. An additional test for n=4 is added in English using a portion of text from https://en.wikipedia.org/wiki/Natural_language_processing.

Furthermore, the CONTRIBUTING.rst is updated to reflect the use of poetry as a package manager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Negative or Zero YAKE score
1 participant