Open
Description
This is a low-traffic issue tracking all the changes happening to the experimental database dumps. We recommend subscribing to this issue to get notified whenever we make some changes to the contents of the dumps.
This is a low-traffic issue tracking all the changes happening to the experimental database dumps. We recommend subscribing to this issue to get notified whenever we make some changes to the contents of the dumps.
Activity
pietroalbini commentedon May 14, 2021
The next crates.io deploy (happening in the next few days) will include the following changes to the database dumps:
textsearchable_index_col
column will be removed fromcrates.csv
, as that column is an implementation detail of crates.io's search. Users importing the database dumps into a PostgreSQL database will not be affected by this change, as a trigger will populate that column at import time.version_downloads.csv
file will only include the last 90 days of data instead of full day-to-day historical data. Cumulative download counts are still available incrates.csv
andversions.csv
.version_authors.csv
file will be removed, as that data was deleted from the crates.io database too.We also plan to make the following changes in the future:
version_downloads
table #3479: all the data fromversion_downloads.csv
will be moved out of the database dump into separate files, one for each day. This will allow clients interested in this data to download it separately.Turbo87 commentedon Aug 16, 2022
Two relevant changes were just deployed:
checksum
field #5077 andTurbo87 commentedon Feb 19, 2024
badges
table #8155 will delete thebadges
tableTurbo87 commentedon Mar 6, 2024
crate_downloads
table #8232 added a newcrate_downloads
table, which is supposed to replace thecrates.downloads
column soon. this was done for performance reasons to reduce the amount of bloat in thecrates
table from the regulardownloads
column updates. at the moment the data should be in sync, but if everything works out we will stop writing to thecrates.downloads
column in the near future and eventually remove it.Turbo87 commentedon Mar 13, 2024
crates.downloads
column writes #8295 is going to disable writes to thecrates.downloads
column. we will keep the column around for now to avoid unnecessary schema churn, but once the system has shown the expected performance benefits we will most likely remove the column completely.Turbo87 commentedon Apr 12, 2024
crates.downloads
column #8233 is merged and deployed it will remove thecrates.downloads
column. please us thecrate_downloads
table instead.Turbo87 commentedon Apr 25, 2024
default_versions
database table #8484 will introduce a new experimentaldefault_versions
table with a mapping from crates to their "default" version, that will be shown by the frontend and used in e.g. reverse dependency queries.Turbo87 commentedon Jun 10, 2024
db-dump.zip
file too #8748 added an experimental ZIP file artifact at https://static.crates.io/db-dump.zip. this file has the advantage of not having to decompress the entire file if you only need access to a certain database table CSV file. compared to the tarball the ZIP file does not have a top-level datetime path prefix, otherwise the files should contain the exact same data.Turbo87 commentedon Oct 29, 2024
num_no_build
column to theversions
tablein other words: no changes (except for the few days in between the PRs) and sorry for forgetting to mention the change here 🙈
Turbo87 commentedon Nov 19, 2024
versions.edition
column to the database and expose it on the API #9932 added a newedition
column to theversions
tableTurbo87 commentedon Nov 26, 2024
versions
table. the data will be backfilled from the saved crate files in the next couple of days. this will likely increase size of the database dump, but with deduplicating compression it hopefully should be manageable.Turbo87 commentedon Nov 29, 2024
num_no_build
column to be "public" again, to fix the nullability issue during importscategories
andkeywords
columns to theversions
table #10078 addscategories
andkeywords
columns to theversions
table