You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The data is now stored in Dataseries that handles all the manipulations, statistics, categoricals, etc internally. The data backend is either a tensor or a tds.Vec in order to better accomodate large datasets.
The self.columns has been dropped and there is now only self.column_order that keeps track of column order.
Most functions now use either tds.Hash or tds.Vec for returning values instead of regular tables.
The data types are now more sophisticate with boolean, integer, long, double, and string. The first and the last are internally stored as tds.Vec while the remaining are in the form of torch tensors.
Since conversions are more restricted with the new column types the is a boolean2tensor and boolean2categorical that help converting boolean columns into numerical.
The Dataframe.schema property has been removed as it now resides in the series. The same information can be retrieved using get_schema().
There is now a custom busted assertion that can compare tensors, tds, and Dataseries.
The csv data is entered using csvigo's large mode thus circumventing the memory limit for large csv's.
The to_/from_categorical now always return a single value when a single value is entered.
Add column now takes a Dataseries instead of a Df_Array
Generalized the argcheck by adding string.split for | separated arguments