refactor(native): Add VeloxPlanValidator to consolidate plan validation logic#27275
refactor(native): Add VeloxPlanValidator to consolidate plan validation logic#27275pramodsatya wants to merge 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideRefactors Presto-to-Velox plan handling into a dedicated presto_cpp/main/plan module, introduces a VeloxPlanChecker that validates plan nodes during conversion (with configurable rejection of nested loop joins), wires this into PrestoServer and TaskResource, and removes the old VeloxPlanValidator/VeloxPlanConversion plumbing and associated build wiring. Sequence diagram for Velox plan validation via sidecar endpointsequenceDiagram
actor Client
participant HttpServer
participant PrestoServer
participant VeloxPlanChecker
participant VeloxInteractiveQueryPlanConverter as Converter
participant VeloxQueryPlanConverterBase as BaseConverter
Client->>HttpServer: POST v1/velox/plan(planFragmentJson)
HttpServer->>PrestoServer: handlePlanValidationRequest(planFragmentJson)
PrestoServer->>VeloxPlanChecker: checkPlanFragment(planFragmentJson, nativeWorkerPool)
activate VeloxPlanChecker
VeloxPlanChecker->>VeloxPlanChecker: parse planFragmentJson to PlanFragment
VeloxPlanChecker->>Converter: construct(queryCtx, pool)
VeloxPlanChecker->>Converter: toVeloxQueryPlan(fragment, tableWriteInfo, taskId, validatePlan=true)
activate Converter
Converter->>BaseConverter: toVeloxQueryPlan(fragment, tableWriteInfo, taskId, validatePlan=true)
activate BaseConverter
BaseConverter->>BaseConverter: set validatePlan_ = true
loop for each Presto plan node
BaseConverter->>BaseConverter: build corresponding Velox PlanNode
BaseConverter->>BaseConverter: validateNode(node)
alt validatePlan_ is true
BaseConverter->>VeloxPlanChecker: isValidPlanNode(node)
alt node is NestedLoopJoinNode and failOnNestedLoopJoin
VeloxPlanChecker-->>BaseConverter: false
BaseConverter-->>Converter: throw VeloxException
Converter-->>VeloxPlanChecker: propagate VeloxException
VeloxPlanChecker-->>PrestoServer: PlanConversionResponse with failures
else plan node supported
VeloxPlanChecker-->>BaseConverter: true
BaseConverter-->>BaseConverter: return node
end
else validatePlan_ is false
BaseConverter-->>BaseConverter: return node without checks
end
end
deactivate BaseConverter
deactivate Converter
VeloxPlanChecker-->>PrestoServer: PlanConversionResponse success
deactivate VeloxPlanChecker
PrestoServer-->>HttpServer: HTTP 200 with response JSON
HttpServer-->>Client: HTTP 200 or 400 based on failures
Class diagram for VeloxPlanChecker and VeloxQueryPlanConverter integrationclassDiagram
class VeloxPlanChecker {
+static PlanConversionResponse checkPlanFragment(planFragmentJson, pool)
+bool isValidPlanNode(node~NestedLoopJoinNode~) const
+bool isValidPlanNode(node~T~) const
}
class VeloxQueryPlanConverterBase {
<<abstract>>
+VeloxQueryPlanConverterBase(queryCtx, pool)
+PlanFragment toVeloxQueryPlan(fragment, tableWriteInfo, taskId, validatePlan=false)
+PlanNodePtr toVeloxQueryPlan(node, tableWriteInfo, taskId)
+PlanNodePtr toVeloxQueryPlanOutputNode(node, tableWriteInfo, taskId)
+PlanNodePtr toVeloxQueryPlanJoinNode(node, tableWriteInfo, taskId)
+PlanNodePtr toVeloxQueryPlanAggregationNode(node, tableWriteInfo, taskId)
+PlanNodePtr toVeloxQueryPlanLimitNode(node, tableWriteInfo, taskId)
+PlanNodePtr toVeloxQueryPlanWindowNode(node, tableWriteInfo, taskId)
+PlanNodePtr toVeloxQueryPlanTableWriteNode(node, tableWriteInfo, taskId)
+PlanNodePtr toVeloxQueryPlanExchangeNode(node, tableWriteInfo, taskId)
+PlanNodePtr toVeloxQueryPlanValuesNode(node, tableWriteInfo, taskId)
+PlanNodePtr toVeloxQueryPlanTableScanNode(node, tableWriteInfo, taskId)
+VectorPtr evaluateConstantExpression(expression, type)
+ColumnStatsSpecPtr toColumnStatsSpec(node, tableWriteInfo, taskId)
+PlanNodePtr toVeloxQueryPlanOutputFragment(fragment, tableWriteInfo, taskId)
+RowTypePtr toRowType(variables, typeParser)
+vector~FieldAccessTypedExprPtr~ toVeloxExprs(variables)
+void toAggregations(outputVariables, aggregationMap, aggregates, aggregateNames)
+shared_ptr~T~ validateNode(node)
-MemoryPool* pool_
-QueryCtx* queryCtx_
-VeloxExprConverter exprConverter_
-TypeParser typeParser_
-bool validatePlan_
-VeloxPlanChecker veloxPlanChecker_
}
class VeloxInteractiveQueryPlanConverter {
+VeloxInteractiveQueryPlanConverter(queryCtx, pool)
+PlanFragment toVeloxQueryPlan(fragment, tableWriteInfo, taskId, validatePlan=false)
+PlanNodePtr toVeloxQueryPlanExchangeNode(node, tableWriteInfo, taskId)
+CommitStrategy getCommitStrategy() const
}
class VeloxBatchQueryPlanConverter {
+VeloxBatchQueryPlanConverter(queryCtx, pool, shuffleName, broadcastBasePath, shuffleWriteInfo)
+PlanFragment toVeloxQueryPlan(fragment, tableWriteInfo, taskId, validatePlan=false)
+PlanNodePtr toVeloxQueryPlanExchangeNode(node, tableWriteInfo, taskId)
+CommitStrategy getCommitStrategy() const
-optional~string~ broadcastBasePath_
-optional~SerializedShuffleWriteInfo~ serializedShuffleWriteInfo_
-string shuffleName_
}
VeloxQueryPlanConverterBase <|-- VeloxInteractiveQueryPlanConverter
VeloxQueryPlanConverterBase <|-- VeloxBatchQueryPlanConverter
VeloxQueryPlanConverterBase --> VeloxPlanChecker : composition
VeloxPlanChecker ..> NestedLoopJoinNode : validates
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
2abde52 to
a6797d1
Compare
|
@majetideepak, could you please review this change? |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Consider moving
PlanConversionPurposeinto its own small header (e.g.,PlanConversionPurpose.h) to avoid coupling converters, validators, and sidecar conversion helpers viaVeloxPlanConversion.hand reduce the risk of circular dependencies as these components evolve. - You can simplify the constructors of
VeloxInteractiveQueryPlanConverterandVeloxBatchQueryPlanConverterby dropping the default value forpurposethere and relying on the default inVeloxQueryPlanConverterBase, which avoids duplicating the default and keeps the API surface slightly cleaner.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider moving `PlanConversionPurpose` into its own small header (e.g., `PlanConversionPurpose.h`) to avoid coupling converters, validators, and sidecar conversion helpers via `VeloxPlanConversion.h` and reduce the risk of circular dependencies as these components evolve.
- You can simplify the constructors of `VeloxInteractiveQueryPlanConverter` and `VeloxBatchQueryPlanConverter` by dropping the default value for `purpose` there and relying on the default in `VeloxQueryPlanConverterBase`, which avoids duplicating the default and keeps the API surface slightly cleaner.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
a6797d1 to
8338ad1
Compare
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The renaming from
planValidator_toplanChecker_inTaskResourceappears incomplete (e.g., constructor initializer list still refers toplanValidator_); update the member name and all usages consistently to avoid compilation errors. - In
TaskResource(and related call sites), the constructor parameter is still namedplanValidatorwhile the type/member are nowVeloxPlanChecker/planChecker_; consider renaming the parameter toplanCheckerto keep the terminology consistent with the refactor and avoid confusion.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The renaming from `planValidator_` to `planChecker_` in `TaskResource` appears incomplete (e.g., constructor initializer list still refers to `planValidator_`); update the member name and all usages consistently to avoid compilation errors.
- In `TaskResource` (and related call sites), the constructor parameter is still named `planValidator` while the type/member are now `VeloxPlanChecker`/`planChecker_`; consider renaming the parameter to `planChecker` to keep the terminology consistent with the refactor and avoid confusion.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
27be965 to
908b418
Compare
|
@sourcery-ai review. |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
VeloxQueryPlanConverterBase::toVeloxQueryPlan(fragment, ..., bool validatePlan)setter-style use of thevalidatePlan_member means the flag persists across calls; if a converter instance is ever reused, consider a scoped guard or passing the checker/flag through the call stack instead to avoid accidental leakage of validation state between conversions.- converter instance is not reused.
- The new nested-loop-join rejection is wired only through the validation path (
validatePlan_ == true); inTaskResource::createOrUpdateBatchTaskyou explicitly passfalse, so batch tasks can still executeNestedLoopJoinNodes even whenplanValidatorFailOnNestedLoopJoinis set—double-check whether that matches the intent to apply this check “in the task path” as described in the PR.- createOrUpdateBatchTask does not validate plan nodes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `VeloxQueryPlanConverterBase::toVeloxQueryPlan(fragment, ..., bool validatePlan)` setter-style use of the `validatePlan_` member means the flag persists across calls; if a converter instance is ever reused, consider a scoped guard or passing the checker/flag through the call stack instead to avoid accidental leakage of validation state between conversions.
- The new nested-loop-join rejection is wired only through the validation path (`validatePlan_ == true`); in `TaskResource::createOrUpdateBatchTask` you explicitly pass `false`, so batch tasks can still execute `NestedLoopJoinNode`s even when `planValidatorFailOnNestedLoopJoin` is set—double-check whether that matches the intent to apply this check “in the task path” as described in the PR.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
849364e to
d7025e7
Compare
|
@amitkdutta, @BryanCutler, could you please help review this refactor? |
BryanCutler
left a comment
There was a problem hiding this comment.
Thanks @pramodsatya , this looks like a good improvement to me. I just had a couple minor questions.
presto-native-execution/presto_cpp/main/plan/PrestoToVeloxQueryPlan.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/plan/VeloxPlanChecker.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/plan/PrestoToVeloxQueryPlan.cpp
Outdated
Show resolved
Hide resolved
d7025e7 to
576e78f
Compare
presto-native-execution/presto_cpp/main/plan/PrestoToVeloxQueryPlan.cpp
Outdated
Show resolved
Hide resolved
|
Thanks @BryanCutler. @majetideepak, could you please help review this refactor? |
3edf8b6 to
f9f922a
Compare
|
@amitkdutta do you have any thoughts on this change? |
8088d94 to
9961a3f
Compare
presto-native-execution/presto_cpp/main/operators/tests/CMakeLists.txt
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/plan/PrestoToVeloxQueryPlan.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/plan/PrestoToVeloxQueryPlan.h
Outdated
Show resolved
Hide resolved
8449425 to
d428323
Compare
| outputType, | ||
| toVeloxSerdeKind(partitioningScheme.encoding), | ||
| sourceNode); | ||
| validateNode(partitionedOutputNode.get()); |
There was a problem hiding this comment.
Why not keep the old assignment and minimize the diff?
planFragment.planNode = core::PartitionedOutputNode::single(
...
)
validateNode(planFragment.planNode.get());
majetideepak
left a comment
There was a problem hiding this comment.
Let's try to reduce the diff. Thanks!
d428323 to
4b099fd
Compare
| const protocol::TaskId& taskId) { | ||
| const protocol::TaskId& taskId, | ||
| bool validatePlan) { | ||
| veloxPlanChecker_ = |
There was a problem hiding this comment.
This will create a new VeloxPlanChecker for every call with validatePlan = true.
We likely want to keep initVeloxPlanValidator API in PrestoServer as is and push down a single instance to TaskResource. This can later be extended to allow other plan checkers.
Let's keep VeloxPlanValidator instead of VeloxPlanChecker.
There was a problem hiding this comment.
Thanks for the suggestion, updated accordingly. Could you PTAL?
4b099fd to
9d3130f
Compare
9d3130f to
4b454cf
Compare
Description
Refactors Velox plan validation around a dedicated
VeloxPlanCheckerand moves plan conversion/checking code intopresto_cpp/main/plan. Consolidates plan validation into the checker, removing the standalone plan validator, and validates Velox plan nodes during conversion so unsupported plans fail early on the validation path.Motivation and Context
This refactoring simplifies the existing checks on Velox plan nodes, including via the sidecar, in order to enable Velox-cuDF plan node validation (see #27273). Instead of plan validation responsibilities split across plan conversion and a separate plan validator, Velox plan node validity is now checked during plan conversion with a unified
VeloxPlanChecker. TheVeloxPlanCheckerwill be leveraged by the sidecar's plan validation endpoint and by the existing Presto to Velox query plan conversion path.To ensure plan node validation is fail-fast and comprehensive, checks now happen as individual Velox plan nodes are constructed from Presto plan fragments. This catches unsupported plans earlier while preserving the normal plan conversion flow (used in execution path).
Impact
presto_cpp/main/typestopresto_cpp/main/planVeloxPlanConversiontoVeloxPlanCheckerVeloxPlanValidatorand folds its logic intoVeloxPlanCheckerVeloxPlanCheckeraPrestoServermember so sidecar can use leverage plan checker functionalityisValidPlanNode(...)API inVeloxPlanCheckerto support custom plan node validationsTest Plan
A lightweight pass-through plan validator is added to plan conversion logic so existing plan node conversion is not impacted. Validation-only conversion path will be exercised by sidecar via
v1/velox/planendpoint.Release Notes
Summary by Sourcery
Introduce a dedicated VeloxPlanChecker and integrate validation into Presto-to-Velox plan conversion while restructuring plan-related code into a new plan module.
Enhancements:
Summary by Sourcery
Introduce a VeloxPlanChecker and integrate Velox plan validation into Presto-to-Velox plan conversion while restructuring plan-related code into a dedicated plan library.
New Features:
Enhancements:
Build:
Tests: