Best practices for large data set backfill #24232
Unanswered
mitchpaulus
asked this question in
Q&A
Replies: 1 comment 3 replies
-
@mitchpaulus What version of InfluxDB are you using? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a large dataset that I am trying to backfill - it is 5-minute interval data, taken for approximately 2,500 sensors, over 6 years. So this is approximately 288 pts/day * 365 days/yr * 6 yrs * 2500 = 1.5 billion records.
I have tried to upload this through the CLI, but have run into issues in which the ./influxd process crashes with no error to stderr after approximately uploading 100 of these files (~60 million records, not rate limited).
Looking through the documentation, I could not find examples or a list of best practices for one-time backfilling of large amounts of data.
I found recommendations for optimizing writes like
Neither of which helped. I don't have a retention policy since we analyze this dataset in various different ways and would like the entire data set to be available.
So general questions that I have that I think would be useful to have answered in a single place would be:
Beta Was this translation helpful? Give feedback.
All reactions