Skip to content

Commit 2367f58

Browse files
davidm-dbcloud-fan
authored andcommitted
[SPARK-52345][SQL] Fix NULL behavior in scripting conditions
### What changes were proposed in this pull request? `NULL` is a valid Boolean "value" in SQL. Scripting engine is not properly handling cases when conditions (in if-else, case, while, repeat statements) returned NULL. Scripting engine throws an exception in such cases, stating that the NULL is an invalid value. Scripting engine should consider such NULLs as a False Boolean value. ### Why are the changes needed? Fixes the wrong behavior in condition evaluation for scripting statements. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Old and new unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51047 from davidm-db/scripting_conditions_null_fix. Authored-by: David Milicevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent 80cd867 commit 2367f58

File tree

4 files changed

+619
-324
lines changed

4 files changed

+619
-324
lines changed

common/utils/src/main/resources/error/error-conditions.json

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -162,12 +162,6 @@
162162
],
163163
"sqlState" : "22003"
164164
},
165-
"BOOLEAN_STATEMENT_WITH_EMPTY_ROW" : {
166-
"message" : [
167-
"Boolean statement <invalidStatement> is invalid. Expected single row with a value of the BOOLEAN type, but got an empty row."
168-
],
169-
"sqlState" : "21000"
170-
},
171165
"CALL_ON_STREAMING_DATASET_UNSUPPORTED" : {
172166
"message" : [
173167
"The method <methodName> can not be called on streaming Dataset/DataFrame."

sql/catalyst/src/main/scala/org/apache/spark/sql/errors/SqlScriptingErrors.scala

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -102,16 +102,6 @@ private[sql] object SqlScriptingErrors {
102102
"sqlScriptingEnabled" -> toSQLConf(SQLConf.SQL_SCRIPTING_ENABLED.key)))
103103
}
104104

105-
def booleanStatementWithEmptyRow(
106-
origin: Origin,
107-
stmt: String): Throwable = {
108-
new SqlScriptingException(
109-
origin = origin,
110-
errorClass = "BOOLEAN_STATEMENT_WITH_EMPTY_ROW",
111-
cause = null,
112-
messageParameters = Map("invalidStatement" -> toSQLStmt(stmt)))
113-
}
114-
115105
def positionalParametersAreNotSupportedWithSqlScripting(): Throwable = {
116106
new SqlScriptingException(
117107
origin = null,

sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -84,18 +84,16 @@ trait NonLeafStatementExec extends CompoundStatementExec {
8484
assert(!statement.isExecuted)
8585
statement.isExecuted = true
8686

87-
// DataFrame evaluates to True if it is single row, single column
88-
// of boolean type with value True.
87+
// First, it is checked if DataFrame represents a valid Boolean condition - single row,
88+
// single column of Boolean type.
89+
// If that is true, the condition evaluates to True only if the Boolean value is True.
90+
// Otherwise, if the Boolean value is False or NULL, the condition evaluates to False.
8991
val df = statement.buildDataFrame(session)
9092
df.schema.fields match {
9193
case Array(field) if field.dataType == BooleanType =>
9294
df.limit(2).collect() match {
9395
case Array(row) =>
94-
if (row.isNullAt(0)) {
95-
throw SqlScriptingErrors.booleanStatementWithEmptyRow(
96-
statement.origin, statement.getText)
97-
}
98-
row.getBoolean(0)
96+
if (row.isNullAt(0)) false else row.getBoolean(0)
9997
case _ =>
10098
throw SparkException.internalError(
10199
s"Boolean statement ${statement.getText} is invalid. It returns more than one row.")

0 commit comments

Comments
 (0)