Closed
Description
Polars version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of Polars.
Issue description
I use Polars to read several parquet files. I also put it in an async loop to use it with asyncio. More or less the code is this:
def _parquet_file_to_df(file_path: str, columns: tuple[str, ...]) -> polars.DataFrame:
return polars.read_parquet(
file_path, columns=list(columns) if columns else None, use_pyarrow=True
)
async def main() -> None:
loop = asyncio.get_running_loop()
for file in [...]:
await loop.run_in_executor(None, func)
if __name__ == "__main__":
asyncio.run(main())
print("done")
Sometimes my program does not terminate even after printing done
. Here some pieces of the stack trace:
I am not sure the problem is 100% related to Polars. Can you help me understand the bug?
Reproducible example
def _parquet_file_to_df(file_path: str, columns: tuple[str, ...]) -> polars.DataFrame:
return polars.read_parquet(
file_path, columns=list(columns) if columns else None, use_pyarrow=True
)
async def main() -> None:
loop = asyncio.get_running_loop()
for file in [...]:
await loop.run_in_executor(None, func)
if __name__ == "__main__":
asyncio.run(main())
print("done")
Expected behavior
The program terminates execution
Installed versions
---Version info---
Polars: 0.15.16
Index type: UInt32
Platform: Linux-5.13.0-52-generic-x86_64-with-glibc2.35
Python: 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
---Optional dependencies---
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
pyarrow: 10.0.1
pandas: 1.5.3
numpy: 1.24.1
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
deltalake: <not installed>
matplotlib: 3.6.3