@@ -438,8 +438,10 @@ Miscellaneous options
438
438
* Set the :attr: `~sys.flags.dev_mode ` attribute of :attr: `sys.flags ` to
439
439
``True ``
440
440
441
- * ``-X utf8 `` enables the UTF-8 mode, whereas ``-X utf8=0 `` disables the
442
- UTF-8 mode.
441
+ * ``-X utf8 `` enables UTF-8 mode for operating system interfaces, overriding
442
+ the default locale-aware mode. ``-X utf8=0 `` explicitly disables UTF-8
443
+ mode (even when it would otherwise activate automatically).
444
+ See :envvar: `PYTHONUTF8 ` for more details.
443
445
444
446
It also allows passing arbitrary values and retrieving them through the
445
447
:data: `sys._xoptions ` dictionary.
@@ -789,36 +791,49 @@ conflict.
789
791
.. envvar :: PYTHONCOERCECLOCALE
790
792
791
793
If set to the value ``0 ``, causes the main Python command line application
792
- to skip coercing the legacy ASCII-based C locale to a more capable UTF-8
793
- based alternative.
794
+ to skip coercing the legacy ASCII-based C and POSIX locales to a more
795
+ capable UTF-8 based alternative.
794
796
795
- If this variable is *not * set, or is set to a value other than ``0 ``, and
796
- the current locale reported for the ``LC_CTYPE `` category is the default
797
- ``C `` locale, then the Python CLI will attempt to configure the following
798
- locales for the ``LC_CTYPE `` category in the order listed before loading the
799
- interpreter runtime:
797
+ If this variable is *not * set (or is set to a value other than ``0 ``), the
798
+ ``LC_ALL `` locale override environment variable is also not set, and the
799
+ current locale reported for the ``LC_CTYPE `` category is either the default
800
+ ``C `` locale, or else the explicitly ASCII-based ``POSIX `` locale, then the
801
+ Python CLI will attempt to configure the following locales for the
802
+ ``LC_CTYPE `` category in the order listed before loading the interpreter
803
+ runtime:
800
804
801
805
* ``C.UTF-8 ``
802
806
* ``C.utf8 ``
803
807
* ``UTF-8 ``
804
808
805
809
If setting one of these locale categories succeeds, then the ``LC_CTYPE ``
806
810
environment variable will also be set accordingly in the current process
807
- environment before the Python runtime is initialized. This ensures the
808
- updated setting is seen in subprocesses, as well as in operations that
809
- query the environment rather than the current C locale (such as Python's
810
- own :func: `locale.getdefaultlocale `).
811
+ environment before the Python runtime is initialized. This ensures that in
812
+ addition to being seen by both the interpreter itself and other locale-aware
813
+ components running in the same process (such as the GNU ``readline ``
814
+ library), the updated setting is also seen in subprocesses (regardless of
815
+ whether or not those processes are running a Python interpreter), as well as
816
+ in operations that query the environment rather than the current C locale
817
+ (such as Python's own :func: `locale.getdefaultlocale `).
811
818
812
819
Configuring one of these locales (either explicitly or via the above
813
- implicit locale coercion) will automatically set the error handler for
814
- :data: `sys.stdin ` and :data: `sys.stdout ` to ``surrogateescape ``. This
815
- behavior can be overridden using :envvar: `PYTHONIOENCODING ` as usual.
820
+ implicit locale coercion) automatically enables the ``surrogateescape ``
821
+ :ref: `error handler <error-handlers >` for :data: `sys.stdin ` and
822
+ :data: `sys.stdout ` (:data: `sys.stderr ` continues to use ``backslashreplace ``
823
+ as it does in any other locale). This stream handling behavior can be
824
+ overridden using :envvar: `PYTHONIOENCODING ` as usual.
816
825
817
826
For debugging purposes, setting ``PYTHONCOERCECLOCALE=warn `` will cause
818
827
Python to emit warning messages on ``stderr `` if either the locale coercion
819
828
activates, or else if a locale that *would * have triggered coercion is
820
829
still active when the Python runtime is initialized.
821
830
831
+ Also note that even when locale coercion is disabled, or when it fails to
832
+ find a suitable target locale, :envvar: `PYTHONUTF8 ` will still activate by
833
+ default in legacy ASCII-based locales. Both features must be disabled in
834
+ order to force the interpreter to use ``ASCII `` instead of ``UTF-8 `` for
835
+ system interfaces.
836
+
822
837
Availability: \* nix
823
838
824
839
.. versionadded :: 3.7
@@ -834,10 +849,56 @@ conflict.
834
849
835
850
.. envvar :: PYTHONUTF8
836
851
837
- If set to ``1 ``, enable the UTF-8 mode. If set to ``0 ``, disable the UTF-8
838
- mode. Any other non-empty string cause an error.
852
+ If set to ``1 ``, enables the interpreter's UTF-8 mode, where ``UTF-8 `` is
853
+ used as the text encoding for system interfaces, regardless of the
854
+ current locale setting.
855
+
856
+ This means that:
857
+
858
+ * :func: `sys.getfilesystemencoding() ` returns ``'UTF-8' `` (the locale
859
+ encoding is ignored).
860
+ * :func: `locale.getpreferredencoding() ` returns ``'UTF-8' `` (the locale
861
+ encoding is ignored, and the function's ``do_setlocale `` parameter has no
862
+ effect).
863
+ * :data: `sys.stdin `, :data: `sys.stdout `, and :data: `sys.stderr ` all use
864
+ UTF-8 as their text encoding, with the ``surrogateescape ``
865
+ :ref: `error handler <error-handlers >` being enabled for :data: `sys.stdin `
866
+ and :data: `sys.stdout ` (:data: `sys.stderr ` continues to use
867
+ ``backslashreplace `` as it does in the default locale-aware mode)
868
+
869
+ As a consequence of the changes in those lower level APIs, other higher
870
+ level APIs also exhibit different default behaviours:
871
+
872
+ * Command line arguments, environment variables and filenames are decoded
873
+ to text using the UTF-8 encoding.
874
+ * :func: `os.fsdecode() ` and :func: `os.fsencode() ` use the UTF-8 encoding.
875
+ * :func: `open() `, :func: `io.open() `, and :func: `codecs.open() ` use the UTF-8
876
+ encoding by default. However, they still use the strict error handler by
877
+ default so that attempting to open a binary file in text mode is likely
878
+ to raise an exception rather than producing nonsense data.
879
+
880
+ Note that the standard stream settings in UTF-8 mode can be overridden by
881
+ :envvar: `PYTHONIOENCODING ` (just as they can be in the default locale-aware
882
+ mode).
883
+
884
+ If set to ``0 ``, the interpreter runs in its default locale-aware mode.
885
+
886
+ Setting any other non-empty string causes an error during interpreter
887
+ initialisation.
888
+
889
+ If this environment variable is not set at all, then the interpreter defaults
890
+ to using the current locale settings, *unless * the current locale is
891
+ identified as a legacy ASCII-based locale
892
+ (as descibed for :envvar: `PYTHONCOERCECLOCALE `), and locale coercion is
893
+ either disabled or fails. In such legacy locales, the interpreter will
894
+ default to enabling UTF-8 mode unless explicitly instructed not to do so.
895
+
896
+ Also available as the :option: `-X ` ``utf8 `` option.
897
+
898
+ Availability: \* nix
839
899
840
900
.. versionadded :: 3.7
901
+ See :pep: `540 ` for more details.
841
902
842
903
843
904
Debug-mode variables
0 commit comments