[SPARK-44120][PYTHON] Support Python 3.12

dongjoon-hyun · dongjoon-hyun · commit 3d231710268d · 2023-10-01T12:39:43.000-07:00
### What changes were proposed in this pull request? This PR aims to support Python 3.12 in Apache Spark 4.0.0. Note that this is tested with [NumPy 1.26](https://pypi.org/project/numpy/1.26.0/) and [Pandas 2.1.1](https://pypi.org/project/pandas/2.1.1/), but without `PyArrow` because it doesn't support Python 3.12 yet. - https://arrow.apache.org/docs/python/install.html#python-compatibility > PyArrow is currently compatible with Python 3.8, 3.9, 3.10 and 3.11. PyArrow is still an optional component for Spark SQL and I believe it will support Python 3.12 soon. ### Why are the changes needed? Python 3.12 release will happen in a few days on 2023-10-02. - https://www.python.org/downloads/ ### Does this PR introduce _any_ user-facing change? No. This is a new addition to PySpark support environment. ### How was this patch tested? Pass the CIs for old Pythons and run manual test with Python 3.12. ``` $ python/run-tests.py --python-executables python3.12 Running PySpark tests. Output is in /Users/dongjoon/PRS/SPARK-44120/python/unit-tests.log Will test against the following Python executables: ['python3.12'] Will test the following Python modules: ['pyspark-connect', 'pyspark-core', 'pyspark-errors', 'pyspark-ml', 'pyspark-ml-connect', 'pyspark-mllib', 'pyspark-pandas', 'pyspark-pandas-connect-part0', 'pyspark-pandas-connect-part1', 'pyspark-pandas-connect-part2', 'pyspark-pandas-connect-part3', 'pyspark-pandas-slow', 'pyspark-resource', 'pyspark-sql', 'pyspark-streaming', 'pyspark-testing'] python3.12 python_implementation is CPython python3.12 version is: Python 3.12.0rc2 ... Finished test(python3.12): pyspark.sql.functions (70s) Finished test(python3.12): pyspark.sql.streaming.readwriter (77s) Tests passed in 398 seconds Skipped tests in pyspark.ml.tests.connect.test_parity_torch_data_loader with python3.12: test_data_loader (pyspark.ml.tests.connect.test_parity_torch_data_loader.TorchDistributorBaselineUnitTestsOnConnect.test_data_loader) ... skipped 'torch is required' test_data_loader (pyspark.ml.torch.tests.test_data_loader.TorchDistributorDataLoaderUnitTests.test_data_loader) ... skipped 'torch is required' Skipped tests in pyspark.ml.torch.tests.test_data_loader with python3.12: test_data_loader (pyspark.ml.torch.tests.test_data_loader.TorchDistributorDataLoaderUnitTests.test_data_loader) ... skipped 'torch is required' ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43184 from dongjoon-hyun/SPARK-44120. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
diff --git a/python/setup.py b/python/setup.py
@@ -337,6 +337,7 @@ def run(self):
             "Programming Language :: Python :: 3.9",
             "Programming Language :: Python :: 3.10",
             "Programming Language :: Python :: 3.11",
+            "Programming Language :: Python :: 3.12",
             "Programming Language :: Python :: Implementation :: CPython",
             "Programming Language :: Python :: Implementation :: PyPy",
             "Typing :: Typed",