Skip to content

Conversation

@stastnypremysl
Copy link

@stastnypremysl stastnypremysl commented May 30, 2025

Draft Pull Request

Summary

  • interval_type type from str to enum
  • Refactored TimeBarAggregator.init, so the variable initialization are before methods calling
  • Added validation of time_bars_origin
  • Fixed the first bar issue on left_open interval (https://nt-dist.stty.cz/issues/20250509-1-first-bar-not-generated-on-millisecond-bug/)
  • Fixed the bug with self.next_close_ns doesn't change, when no tick is sent
  • Fixed potential bug with first bar skipped on batch processing
  • Added _invalidate_skip_first_non_full_bar_on_exact_start for batch proccesing
  • Added step validation to BarSpecification
  • TBD

Related Issues/PRs

https://nt-dist.stty.cz/issues/20250509-1-first-bar-not-generated-on-millisecond-bug/

Type of change

  • Bug fix (non-breaking)
  • New feature (non-breaking)
  • Breaking change (impacts existing behavior)
  • Documentation update
  • Maintenance / chore

Breaking change details (if applicable)

Release notes

  • I added a concise entry to RELEASES.md that follows the existing conventions (when applicable)

Testing

Ensure new or changed logic is covered by tests.

  • Affected code paths are already covered by the test suite
  • I added/updated tests to cover new or changed logic

@cjdsellers cjdsellers changed the title Draft: TimeBasedAggregator refactorization Refactor time-based bar aggregation Jun 3, 2025
Copy link
Member

@cjdsellers cjdsellers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the additional test coverage, this is valuable.


cdef BarAggregation aggregation = self.bar_type.spec.aggregation

if type(self._time_bars_origin) is not pd.Timedelta and type(self._time_bars_origin) is not pd.DateOffset:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be validated in the constructor, the Condition.type function may be useful for standadization.

start_time += pd.DateOffset(months=step)

start_time -= pd.DateOffset(months=step)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not add a blank between keywords in an if-elif-else block, I think this makes the code less cohesive and harder to reason about.


return start_time


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methods should be separated by a single blank, functions are separated by two blanks.

This would be autoformatted if it was a .py, except our Python formatter can't parse this Cython.


# Delay to reset of _batch_next_close_ns to allow the creation of a last histo bar
# when transitioning to regular bars
# TODO: Refactor this, it is needless now (the comment above doesn't apply)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment so much on this batch mode, as it's a later addition and I don't have full context.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with the useless comment, that's useful when transitioning from the aggregation of a request to the start of a backtest. There are many edge cases, this code was actually a hard edge case to solve.

Copy link
Collaborator

@faysou faysou Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this edge case wouldn't be necessary with fire immediately as you mention. Because what I do here is basically allow to build a bar without a timer, when in a case where the first bar arrives at the same as the beginning of a backtest.

cdef int step = self.bar_type.spec.step

if self.bar_type.spec.aggregation != BarAggregation.MONTH:
# On receiving this event, timer should now have a new `next_time_ns`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this where we're solving the next_time_ns not being updated along with the clocks internal timer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, then yes.

# _build_on_next_tick is used to avoid a race condition between a data update and a TimeEvent from the timer
self._build_on_next_tick = True
self._stored_close_ns = self.next_close_ns
self._stored_close_ns = event.ts_event
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part of the logic is harder to reason about. It could be fixing a bug, especially if self.next_close_ns wasn't being updated -- but if it was, we're changing this from next close to current event time?

This was also the block we were considering removing? The comprehensive test coverage may help us to make a decision here.

@CLAassistant
Copy link

CLAassistant commented Jun 16, 2025

CLA assistant check
All committers have signed the CLA.

@stastnypremysl
Copy link
Author

Closing this, as the change have been/will be distributed across different PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants