Skip to content

Improve time series filtering based on cutoff, horizon and min_context_length #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 27, 2025

Conversation

shchur
Copy link
Contributor

@shchur shchur commented May 27, 2025

Issue #, if available:

This PR makes it easier to work with datasets where some datasets are too short for the specific Task configuration.

Before this PR

Previously, we had the argument min_ts_length (defaults to horizon + 1) to deal with such datasets. The filtering logic was as follows:

  1. Remove all time series that have < min_ts_length observations
  2. Try to slice each time series at the cutoff. If the part before cutoff has < 1 observations OR if the part after cutoff has < horizon observations, raise an exception.

For example, if some time series are really long, but actually have no observations before the cutoff, we will run into errors. It's not trivial to filter them out by setting min_ts_length, especially if different time series have different lengths in the dataset.

This PR

We replace the min_ts_length argument with min_context_length (defaults to 1).

We change the filtering logic to remove time series if:

  • It contains < min_context_length observations before cutoff
  • It contains < horizon observations after cutoff

These changes are 100% backwards compatible with the old behavior, but now make it much easier to work with datasets where time series have wildly different lengths / cover different time periods. Specifically:

  • With this PR, task configurations that previously resulted in errors (exception raised when slicing the data) will now filter out some time series and work normally afterwards.
  • Task configurations that worked before will continue working in an identical way (so filtering logic for them won't be affected).

Other changes

  • Change version to 0.5.0rc1 for the pre-release
  • Update cell output in notebooks and README

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@shchur shchur changed the title Improve length-based time series filtering Improve time series filtering based on cutoff, horizon and min_context_length May 27, 2025
@shchur shchur requested a review from abdulfatir May 27, 2025 13:05
Copy link
Collaborator

@abdulfatir abdulfatir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the logging comment.

@shchur shchur merged commit cd7ccea into main May 27, 2025
@shchur shchur deleted the filter-series-based-on-cutoff branch May 27, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants