Closed
Description
In some case a DataFrame exported to excel present some bad values.
It's is not a problem of Excel reading (the data inside the sheet1.xml of the .xlsx file is also incorrect).
The same DataFrame exported to ".csv" is correct.
The problem could be "solved" by renaming the column header as [col-1, col-2,...]. Maybe an encoding problem ?
The issue is that there is no warning/error during the export. It's very easy to miss it.
To reproduce:
import pandas as pd
df = pd.read_pickle('problematic_df.pkl')
df.to_excel('problematic_df.xlsx')
df.to_csv('problematic_df.csv')
with the file available here: https://drive.google.com/file/d/0Bzz_ZaP_wS_HMFdlMkVzaTR0cjA/view?usp=sharing
Note that the content of cell M14 is different in both file (at least when run on my computer)
Using:
- Python 3.4.3 |Anaconda 2.3.0 (64-bit)
- pandas 0.16.2
- Windows 7 64 bits
Activity
jreback commentedon Sep 2, 2015
can you show the dataframe in question above (
.info
and.dtypes
), and a sample.head()
bertrandhaut commentedon Sep 2, 2015
df.info()
df.dtypes
df.head()
bertrandhaut commentedon Sep 2, 2015
For information the value incorrectly exported to Excel is TEMP 11 (B14)_°C at time 11/08/2015 01:05:00.
In the .csv file and as printed by ".head()" the value is 10.4107588318426.
In the .xlsx file the value is 20.8091542146332
dsm054 commentedon Sep 3, 2015
This might be the old excel-duplicate-column-name problem. The problem goes away if I rename the frame so that no columns are duplicated, and it looks like the errors come in after the first duplicate
(Ref #11/ADIT_°C)
.bertrandhaut commentedon Sep 3, 2015
I've just tried to modify the name of the last column (the second Ref #11/ADIT_°C ) and indeed it solved the problem.