-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Reproducible Example
from datetime import datetime
import pandas as pd
writer = pd.ExcelWriter('datetime_bug.xlsx', engine='xlsxwriter')
workbook = writer.book
# datetime from pandas is not converted with the specified num_format
data = pd.DataFrame([[0.05, datetime(year=2020, month=1, day=1)]], columns=["percentage", "datetime_date"])
data.to_excel(writer, sheet_name="test_sheet", index=False)
percentage_format = workbook.add_format({'num_format':'0.00%'})
date_format = workbook.add_format({'num_format':'dd.mm.yyyy'})
writer.sheets["test_sheet"].set_column('A:A', width=None, cell_format=percentage_format)
writer.sheets["test_sheet"].set_column('B:B', width=None, cell_format=date_format)
workbook.close()
Issue Description
I created an Excel sheet using pd.to_excel()
. The data contains one column with a float number and one column with a datetime.datetime
object.
Then, I want to set the column type using the num_format
parameter in set_column
.
This works for the float value, which I want in the percentage format.
However, it does not work for the datetime value. The value is in the wrong format "01.01.2020 00:00:00" (German date). The value in the Excel file is in the same wrong format, whether I run set_column()
for the date column or not.
I have tried to circle in the problem by trying other variations, which turns out that they do work. However, those would be only workarounds for our use case.
-
Using the parameter
datetime_format='dd.mm.yyyy'
in thepd.ExcelWriter
constructor puts the date into the right format. -
Instead of writing the sheet from a DataFrame, using the
xlsxwriter
functions works fine:
import xlsxwriter
from datetime import datetime
workbook = xlsxwriter.Workbook('datetime_bug.xlsx')
worksheet = workbook.add_worksheet()
# using the xlsxwriter functions works as intended
worksheet.write('A1', 0.05)
worksheet.write('B1', datetime(year=2020, month=1, day=1))
percentage_format = workbook.add_format({'num_format':'0.00%'})
date_format = workbook.add_format({'num_format':'dd.mm.yyyy'})
worksheet.set_column('A:A', width=None, cell_format=percentage_format)
worksheet.set_column('B:B', width=None, cell_format=date_format)
workbook.close()
Expected Behavior
The value is in the format "01.01.2020" when using set_column
with pd.ExcelWriter
.
Installed Versions
pandas : 2.1.0
numpy : 1.26.0
pytz : 2023.3
dateutil : 2.8.2
setuptools : 65.7.0
pip : 22.0.4
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.3.3
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 3.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
sqlalchemy : None
tables : None
tabulate : 0.8.7
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
Activity
rhshadrach commentedon Sep 19, 2023
set_column does not change the format; it changes the default format:
Since pandas is setting the format for date cells, it has no effect. As far as I can tell there is no easy way to overwrite an existing format with xlsxwriter. The best resolution I see for this within pandas is to use datetime_format as you mentioned.
Pavanmahaveer7 commentedon Oct 26, 2023
So here if we are looking to change any format of date column, by using strftime function we could make the possible changes to the date values.
rhshadrach commentedon Oct 26, 2023
@Pavanmahaveer7 - I think you are saying to modify the format prior calling
pd.write_excel
. Yes, that is an option, but then I believe the Excel file will not have the Date datatype.lrisch commentedon Oct 27, 2023
@rhshadrach Okay thanks for that information, I didn't know it only changed the default format. In that case we will keep using datetime_format 👍
rhshadrach commentedon Oct 30, 2023
Currently the default value of
datetime_formatstr
isNone
, which gives you the format"YYYY-MM-DD HH:MM:SS"
. We could instead make the default value be""YYYY-MM-DD HH:MM:SS"
and then ifNone
is passed, apply no formatting. It would allow you to then apply your own formatting with xlsxwriter. However, without applying any formatting I think the values would then appear as integers in the Excel sheet - and this would perhaps cause confusion.I still think a better resolution is to have a way to override the existing format via xlsxwriter.
lrisch commentedon Nov 17, 2023
@rhshadrach Im sorry, I dont really understand your last message.
As I understand it, I cannot use set_column because pandas is overwriting whatever I set as a format there. Therefore I have to use datetime_format (which is fine, it would have been nicer to be able to use set_column since we use it for the other datatypes to, but it is ok as a workaround).
But I dont understand, why do you want to change the default of datetime_format?
And what do you mean by this: "I still think a better resolution is to have a way to override the existing format via xlsxwriter."
rhshadrach commentedon Nov 18, 2023
Depending on interpretation, I think this is not correct - in particular pandas is not overwriting. pandas sets the format as
"YYYY-MM-DD HH:MM:SS"
first and then you are callingset_column
. The issue is that callingset_column
does nothing because it only modifies the default format. Modifying the default format has no impact because pandas has set a (non-default) format.If users could tell pandas to not set any format for datetime (my proposal above - which I do not like but it is an option), then changing the default format would have an impact and so your call to
set_column
would have the desired effect.It seems better to me if xlsxwriter gave you the ability to modify the format on an entire column, and not just the default format. I believe xlsxwriter has no such function.
rhshadrach commentedon Nov 18, 2023
Perhaps my use of "default format" is confusing and I should be saying "no format" instead.
rhshadrach commentedon Feb 17, 2024
Looking into this again, I'm now understanding that Excel has cell, row, and column formats. The
set_column
sets the column format, but priority is given to any cell in the column that has a cell format set.Also,
ExcelWriter
does havedate_format
anddatetime_format
arguments that can be set!rhshadrach commentedon Feb 17, 2024
@jmcnamara - I was wondering if you could confirm my understanding here is correct. I believe this comment is self-contained, so hopefully you don't have to wade through the above comments 😄.
pandas writes dates and datetimes by applying a cell format to particular cells. This makes xlxswriter's
set_column
(andset_row
) ineffective on such cells - perhaps surprisingly until you understand how Excel has different formatters for rows, columns, and cells.I was wondering if pandas should do something different here - but it appears there is no "default date format" in Excel. XlsxWriter does have a
default_date_format
, but it appears this is implemented by setting the cell format, similar to pandas:https://github.com/jmcnamara/XlsxWriter/blob/cedf448ad2139ae3fb88dd618b0d78d67a39f2f5/xlsxwriter/worksheet.py#L1102-L1107
So users wanting to change the format of date and datetime columns that already have a format applied just need to loop over the cells in a column.
jmcnamara commentedon Feb 17, 2024
Yes that is correct. To reiterate: Pandas applies a cell format when writing a datetime and that cell format can't/won't be overwritten by a row or column format.
@rhshadrach I think the proposal to have change the behaviour of
datetime_format=None
anddate_format=None
inpd.ExcelWriter()
so that they don't apply a default format is a good one. That would allow the user to apply a format viaset_column()
. In most cases applying the number format viadatetime_format
anddate_format
is sufficient for most users but sometimes they need to also apply font or border formatting as well and then the ability to turn off the Pandas formatting and useset_column()
instead would be useful.For reference here are the XlsxWriter docs where I try to point people in the right direction on Formatting of the Dataframe output.
rhshadrach commentedon Feb 19, 2024
Thanks @jmcnamara!
Makes sense - but I would think just not as the default. I'll put up a PR for this and improve some of the docs.