Suppress C++ stacktrace on `XLA_CHECK*()` calls. #9448

ysiraichi · 2025-07-07T14:54:45Z

This PR improves error messages in PyTorch/XLA by suppressing the display of C++ stack traces during XLA check failures, making them more user-friendly. Currently, when XLA_CHECK*() fails, the resulting error output includes a lengthy and verbose C++ stacktrace. While these can be useful for deep-dive debugging by developers, they often add noise for end-users.

Key Changes:

Suppression of C++ stacktraces for all XLA_CHECK*() failures
Add source location information if XLA_SHOW_CPP_ERROR_CONTEXT=1
- Following the changes in Error Handling: refactor XlaCoordinator to use status types. #9386

Before:

Traceback (most recent call last):
  File "dot.py", line 6, in <module>
    torch.dot(a, b)
RuntimeError: torch_xla/csrc/aten_xla_bridge.cpp:110 : Check failed: xtensor
*** Begin stack trace ***
        tsl::CurrentStackTrace[abi:cxx11]()
        torch_xla::bridge::GetXlaTensor(at::Tensor const&)
        torch_xla::XLANativeFunctions::dot(at::Tensor const&, at::Tensor const&)

        c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const
        c10::KernelFunction::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const
        c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const



        c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const


        at::_ops::dot::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)





        at::_ops::dot::call(at::Tensor const&, at::Tensor const&)
        at::Tensor::dot(at::Tensor const&) const



        _PyObject_MakeTpCall
        _PyEval_EvalFrameDefault

        PyEval_EvalCode



        _PyRun_SimpleFileObject
        _PyRun_AnyFileObject
        Py_RunMain
        Py_BytesMain
        __libc_start_main
        _start
*** End stack trace ***
Input tensor is not an XLA tensor: torch.FloatTensor

After:

Traceback (most recent call last):
  File "dot.py", line 6, in <module>
    torch.dot(a, b)
RuntimeError: Check failed: xtensor: Input tensor is not an XLA tensor: torch.FloatTensor

zhanyong-wan

Thanks!

zhanyong-wan · 2025-07-07T15:14:24Z

torch_xla/csrc/runtime/tf_logging.cpp

-  ess << file_ << ":" << line_ << " : " << sink_str;
+  ess << sink.str();
+
+  if (ShouldShowCppErrorContext()) {


Can we include the C++ stack trace when this is true?

zhanyong-wan · 2025-07-07T15:15:00Z

torch_xla/csrc/status.h

+//
+// More specifically, whether the `XLA_SHOW_CPP_ERROR_CONTEXT` environment
+// variable is set or not.
+bool ShouldShowCppErrorContext();


Style: [[nodiscard]]

zhanyong-wan · 2025-07-07T15:15:45Z

torch_xla/csrc/runtime/debug_macros.h

@@ -7,14 +7,14 @@
 #include "tsl/platform/statusor.h"

 #define XLA_ERROR() TF_ERROR_STREAM()


Document how these macros react to XLA_SHOW_CPP_ERROR_CONTEXT?

zhanyong-wan · 2025-07-07T15:16:26Z

torch_xla/csrc/runtime/tf_logging.cpp

-  ess << file_ << ":" << line_ << " : " << sink_str;
+  ess << sink.str();
+
+  if (ShouldShowCppErrorContext()) {


Add tests to verify the new behavior?

Improve TORCH_CHECK error message.

815b33b

ysiraichi requested review from ghpvnist and zhanyong-wan July 7, 2025 14:54

Fix lint + add collon.

74b015d

zhanyong-wan requested changes Jul 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suppress C++ stacktrace on `XLA_CHECK*()` calls. #9448

Suppress C++ stacktrace on `XLA_CHECK*()` calls. #9448

Uh oh!

ysiraichi commented Jul 7, 2025

Uh oh!

zhanyong-wan left a comment

Uh oh!

zhanyong-wan Jul 7, 2025

Uh oh!

zhanyong-wan Jul 7, 2025

Uh oh!

zhanyong-wan Jul 7, 2025

Uh oh!

zhanyong-wan Jul 7, 2025

Uh oh!

Uh oh!

		@@ -7,14 +7,14 @@
		#include "tsl/platform/statusor.h"

		#define XLA_ERROR() TF_ERROR_STREAM()

Suppress C++ stacktrace on XLA_CHECK*() calls. #9448

Are you sure you want to change the base?

Suppress C++ stacktrace on XLA_CHECK*() calls. #9448

Uh oh!

Conversation

ysiraichi commented Jul 7, 2025

Uh oh!

zhanyong-wan left a comment

Choose a reason for hiding this comment

Uh oh!

zhanyong-wan Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

zhanyong-wan Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

zhanyong-wan Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

zhanyong-wan Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Suppress C++ stacktrace on `XLA_CHECK*()` calls. #9448

Suppress C++ stacktrace on `XLA_CHECK*()` calls. #9448