Description
Code Sample, a copy-pastable example if possible
# Your code here
obj=s3.get_object(Bucket=bucket,Key=file_name)
body = obj['Body'].read()
df = pd.read_excel(io.BytesIO(body),encoding='utf-8',sheet_name='Sheet1',skiprows=4,usecols=use_cols,header=None,names=col_Names,)
Problem description
Excel read file crashes when reading. Specific excel sheet has 300 columns and 130,000 records, reduced columns to 40 by defining filter.
Error Reported:
Unable to allocate array with shape (133271, 319) and data type object: MemoryError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 49, in lambda_handler
df = pd.read_excel(io.BytesIO(body),encoding='utf-8',sheet_name='Products',skiprows=4,usecols=use_cols,header=None,names=col_Names,)
File "/opt/python/lib/python3.6/site-packages/pandas/util/_decorators.py", line 208, in wrapper
return func(*args, **kwargs)
File "/opt/python/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 340, in read_excel
**kwds
File "/opt/python/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 883, in parse
**kwds
File "/opt/python/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 516, in parse
output[asheetname] = parser.read(nrows=nrows)
File "/opt/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 1154, in read
ret = self._engine.read(nrows)
File "/opt/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 2493, in read
alldata = self._rows_to_cols(content)
File "/opt/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 3160, in _rows_to_cols
zipped_content = list(lib.to_object_array(content, min_width=col_len).T)
File "pandas/_libs/lib.pyx", line 2279, in pandas._libs.lib.to_object_array
MemoryError: Unable to allocate array with shape (133271, 319) and data type object
If the issue has not been resolved there, go ahead and file it in the issue tracker.
Expected Output
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
pandas 0.25.3
Activity
jbrockmendel commentedon Nov 13, 2019
Does decreasing the number of rows/columns read solve the problem? If so, what is the minimal size at which the problem occurs?
dcostelloe2019 commentedon Nov 14, 2019
No change in decreased column, not sure how would I get the number of processed rows?
Liam3851 commentedon Nov 14, 2019
@dcostelloe2019 If you're running on AWS Lambda are you allocating enough memory to the process? The default memory allocation on Lambda is just 128 MB, and an array that size would be about 340 MB (133271 * 319 * 8 / 1e6). My guess is you probably need to configure your lambda function with more RAM (try running it locally to make sure).
dcostelloe2019 commentedon Nov 16, 2019
Thanks @Liam3851: I gave it the max memory available 3008 MB
I split the workbook into two separate Excel workbooks:
Both processed successfully
Still same error with full sized workbook :-(
simonjayhawkins commentedon Apr 1, 2020
@dcostelloe2019 Thanks for the report. closing as this doesn't look actionable. We would need a minimal reproducible example to help debug the issue. see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports ping to reopen