You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### What changes were proposed in this pull request?
This PR aims to support Python 3.12 in Apache Spark 4.0.0.
Note that this is tested with [NumPy 1.26](https://pypi.org/project/numpy/1.26.0/) and [Pandas 2.1.1](https://pypi.org/project/pandas/2.1.1/), but without `PyArrow` because it doesn't support Python 3.12 yet.
- https://arrow.apache.org/docs/python/install.html#python-compatibility
> PyArrow is currently compatible with Python 3.8, 3.9, 3.10 and 3.11.
PyArrow is still an optional component for Spark SQL and I believe it will support Python 3.12 soon.
### Why are the changes needed?
Python 3.12 release will happen in a few days on 2023-10-02.
- https://www.python.org/downloads/
### Does this PR introduce _any_ user-facing change?
No. This is a new addition to PySpark support environment.
### How was this patch tested?
Pass the CIs for old Pythons and run manual test with Python 3.12.
```
$ python/run-tests.py --python-executables python3.12
Running PySpark tests. Output is in /Users/dongjoon/PRS/SPARK-44120/python/unit-tests.log
Will test against the following Python executables: ['python3.12']
Will test the following Python modules: ['pyspark-connect', 'pyspark-core', 'pyspark-errors', 'pyspark-ml', 'pyspark-ml-connect', 'pyspark-mllib', 'pyspark-pandas', 'pyspark-pandas-connect-part0', 'pyspark-pandas-connect-part1', 'pyspark-pandas-connect-part2', 'pyspark-pandas-connect-part3', 'pyspark-pandas-slow', 'pyspark-resource', 'pyspark-sql', 'pyspark-streaming', 'pyspark-testing']
python3.12 python_implementation is CPython
python3.12 version is: Python 3.12.0rc2
...
Finished test(python3.12): pyspark.sql.functions (70s)
Finished test(python3.12): pyspark.sql.streaming.readwriter (77s)
Tests passed in 398 seconds
Skipped tests in pyspark.ml.tests.connect.test_parity_torch_data_loader with python3.12:
test_data_loader (pyspark.ml.tests.connect.test_parity_torch_data_loader.TorchDistributorBaselineUnitTestsOnConnect.test_data_loader) ... skipped 'torch is required'
test_data_loader (pyspark.ml.torch.tests.test_data_loader.TorchDistributorDataLoaderUnitTests.test_data_loader) ... skipped 'torch is required'
Skipped tests in pyspark.ml.torch.tests.test_data_loader with python3.12:
test_data_loader (pyspark.ml.torch.tests.test_data_loader.TorchDistributorDataLoaderUnitTests.test_data_loader) ... skipped 'torch is required'
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#43184 from dongjoon-hyun/SPARK-44120.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
0 commit comments