Skip to content

DEPR: DataFrame.get_dtype_counts #27145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jul 3, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions doc/source/getting_started/basics.rst
Original file line number Diff line number Diff line change
@@ -1968,11 +1968,11 @@ dtype of the column will be chosen to accommodate all of the data types
pd.Series([1, 2, 3, 6., 'foo'])

The number of columns of each type in a ``DataFrame`` can be found by calling
:meth:`~DataFrame.get_dtype_counts`.
``DataFrame.dtypes.value_counts()``.

.. ipython:: python

dft.get_dtype_counts()
dft.dtypes.value_counts()

Numeric dtypes will propagate and can coexist in DataFrames.
If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``,
2 changes: 1 addition & 1 deletion doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
@@ -3754,7 +3754,7 @@ defaults to `nan`.
store.append('df_mixed', df_mixed, min_itemsize={'values': 50})
df_mixed1 = store.select('df_mixed')
df_mixed1
df_mixed1.get_dtype_counts()
df_mixed1.dtypes.value_counts()

# we have provided a minimum string column size
store.root.df_mixed.table
2 changes: 1 addition & 1 deletion doc/source/user_guide/missing_data.rst
Original file line number Diff line number Diff line change
@@ -105,7 +105,7 @@ pandas objects provide compatibility between ``NaT`` and ``NaN``.
df2
df2.loc[['a', 'c', 'h'], ['one', 'timestamp']] = np.nan
df2
df2.get_dtype_counts()
df2.dtypes.value_counts()

.. _missing.inserting:

2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.10.1.rst
Original file line number Diff line number Diff line change
@@ -89,7 +89,7 @@ You can now store ``datetime64`` in data columns
store.append('df_mixed', df_mixed)
df_mixed1 = store.select('df_mixed')
df_mixed1
df_mixed1.get_dtype_counts()
df_mixed1.dtypes.value_counts()

You can pass ``columns`` keyword to select to filter a list of the return
columns, this is equivalent to passing a
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.11.0.rst
Original file line number Diff line number Diff line change
@@ -296,7 +296,7 @@ Furthermore ``datetime64[ns]`` columns are created by default, when passed datet
df

# datetime64[ns] out of the box
df.get_dtype_counts()
df.dtypes.value_counts()

# use the traditional nan, which is mapped to NaT internally
df.loc[df.index[2:4], ['A', 'timestamp']] = np.nan
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
@@ -762,6 +762,7 @@ Other deprecations
- :meth:`Series.put` is deprecated. (:issue:`18262`)
- :meth:`Index.item` and :meth:`Series.item` is deprecated. (:issue:`18262`)
- :meth:`Index.contains` is deprecated. Use ``key in index`` (``__contains__``) instead (:issue:`17753`).
- :meth:`DataFrame.get_dtype_counts` is deprecated. (:issue:`18262`)

.. _whatsnew_0250.prior_deprecations:

6 changes: 3 additions & 3 deletions pandas/core/computation/expressions.py
Original file line number Diff line number Diff line change
@@ -79,11 +79,11 @@ def _can_use_numexpr(op, op_str, a, b, dtype_check):
# check for dtype compatibility
dtypes = set()
for o in [a, b]:
if hasattr(o, 'get_dtype_counts'):
s = o.get_dtype_counts()
if hasattr(o, 'dtypes'):
s = o.dtypes.value_counts()
if len(s) > 1:
return False
dtypes |= set(s.index)
dtypes |= set(s.index.astype(str))
elif isinstance(o, np.ndarray):
dtypes |= {o.dtype.name}

2 changes: 1 addition & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
@@ -2326,7 +2326,7 @@ def _sizeof_fmt(num, size_qualifier):
else:
_verbose_repr()

counts = self.get_dtype_counts()
counts = self._data.get_dtype_counts()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay. It's internal usage and slightly more performant I would think than dtype.value_counts() (left as a dictionary as opposed to constructing the Series)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove get_dtype_counts() from blocks its unecessary as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to be needed to get the dtypes later on for info?

dtypes = ['{k}({kk:d})'.format(k=k[0], kk=k[1]) for k
in sorted(counts.items())]
lines.append('dtypes: {types}'.format(types=', '.join(dtypes)))
8 changes: 8 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
@@ -5263,6 +5263,10 @@ def get_dtype_counts(self):
"""
Return counts of unique dtypes in this object.

.. deprecated:: 0.25.0

Use `.dtypes.value_counts()` instead.

Returns
-------
dtype : Series
@@ -5288,6 +5292,10 @@ def get_dtype_counts(self):
object 1
dtype: int64
"""
warnings.warn("`get_dtype_counts` has been deprecated and will be "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the docstring and add deprecated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we recommend .dtypes.value_counts() here instead? Or... we're in generic.py so that may be too hard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately that solution does not work for Series, but I could add for DataFrames use .dtypes.value_counts()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, just need something as a replacement (may also want to add in the doc-string itself)

"removed in a future version. For DataFrames use "
"`.dtypes.value_counts()", FutureWarning,
stacklevel=2)
from pandas import Series
return Series(self._data.get_dtype_counts())

10 changes: 5 additions & 5 deletions pandas/tests/frame/test_api.py
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@

import pandas as pd
from pandas import (
Categorical, DataFrame, Series, SparseDataFrame, compat, date_range,
timedelta_range)
Categorical, DataFrame, Series, SparseDataFrame, SparseDtype, compat,
date_range, timedelta_range)
import pandas.util.testing as tm
from pandas.util.testing import (
assert_almost_equal, assert_frame_equal, assert_series_equal)
@@ -433,11 +433,11 @@ def test_with_datetimelikes(self):
'B': timedelta_range('1 day', periods=10)})
t = df.T

result = t.get_dtype_counts()
result = t.dtypes.value_counts()
if self.klass is DataFrame:
expected = Series({'object': 10})
expected = Series({np.dtype('object'): 10})
else:
expected = Series({'Sparse[object, nan]': 10})
expected = Series({SparseDtype(dtype=object): 10})
tm.assert_series_equal(result, expected)


8 changes: 4 additions & 4 deletions pandas/tests/frame/test_arithmetic.py
Original file line number Diff line number Diff line change
@@ -273,8 +273,8 @@ def test_df_flex_cmp_constant_return_types(self, opname):
df = pd.DataFrame({'x': [1, 2, 3], 'y': [1., 2., 3.]})
const = 2

result = getattr(df, opname)(const).get_dtype_counts()
tm.assert_series_equal(result, pd.Series([2], ['bool']))
result = getattr(df, opname)(const).dtypes.value_counts()
tm.assert_series_equal(result, pd.Series([2], index=[np.dtype(bool)]))

@pytest.mark.parametrize('opname', ['eq', 'ne', 'gt', 'lt', 'ge', 'le'])
def test_df_flex_cmp_constant_return_types_empty(self, opname):
@@ -283,8 +283,8 @@ def test_df_flex_cmp_constant_return_types_empty(self, opname):
const = 2

empty = df.iloc[:0]
result = getattr(empty, opname)(const).get_dtype_counts()
tm.assert_series_equal(result, pd.Series([2], ['bool']))
result = getattr(empty, opname)(const).dtypes.value_counts()
tm.assert_series_equal(result, pd.Series([2], index=[np.dtype(bool)]))


# -------------------------------------------------------------------
25 changes: 14 additions & 11 deletions pandas/tests/frame/test_block_internals.py
Original file line number Diff line number Diff line change
@@ -217,19 +217,21 @@ def test_construction_with_mixed(self, float_string_frame):
df = DataFrame(data)

# check dtypes
result = df.get_dtype_counts().sort_values()
result = df.dtypes
expected = Series({'datetime64[ns]': 3})

# mixed-type frames
float_string_frame['datetime'] = datetime.now()
float_string_frame['timedelta'] = timedelta(days=1, seconds=1)
assert float_string_frame['datetime'].dtype == 'M8[ns]'
assert float_string_frame['timedelta'].dtype == 'm8[ns]'
result = float_string_frame.get_dtype_counts().sort_values()
expected = Series({'float64': 4,
'object': 1,
'datetime64[ns]': 1,
'timedelta64[ns]': 1}).sort_values()
result = float_string_frame.dtypes
expected = Series([np.dtype('float64')] * 4 +
[np.dtype('object'),
np.dtype('datetime64[ns]'),
np.dtype('timedelta64[ns]')],
index=list('ABCD') + ['foo', 'datetime',
'timedelta'])
assert_series_equal(result, expected)

def test_construction_with_conversions(self):
@@ -409,11 +411,12 @@ def test_get_numeric_data(self):
df = DataFrame({'a': 1., 'b': 2, 'c': 'foo',
'f': Timestamp('20010102')},
index=np.arange(10))
result = df.get_dtype_counts()
expected = Series({'int64': 1, 'float64': 1,
datetime64name: 1, objectname: 1})
result = result.sort_index()
expected = expected.sort_index()
result = df.dtypes
expected = Series([np.dtype('float64'),
np.dtype('int64'),
np.dtype(objectname),
np.dtype(datetime64name)],
index=['a', 'b', 'c', 'f'])
assert_series_equal(result, expected)

df = DataFrame({'a': 1., 'b': 2, 'c': 'foo',
6 changes: 4 additions & 2 deletions pandas/tests/frame/test_combine_concat.py
Original file line number Diff line number Diff line change
@@ -17,8 +17,10 @@ def test_concat_multiple_frames_dtypes(self):
A = DataFrame(data=np.ones((10, 2)), columns=[
'foo', 'bar'], dtype=np.float64)
B = DataFrame(data=np.ones((10, 2)), dtype=np.float32)
results = pd.concat((A, B), axis=1).get_dtype_counts()
expected = Series(dict(float64=2, float32=2))
results = pd.concat((A, B), axis=1).dtypes
expected = Series([np.dtype('float64')] * 2 +
[np.dtype('float32')] * 2,
index=['foo', 'bar', 0, 1])
assert_series_equal(results, expected)

@pytest.mark.parametrize('data', [
Loading