Skip to content

🛠️ Fix failing workflow test and implement comprehensive retry handling #2167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

FunamaYukina
Copy link
Member

@FunamaYukina FunamaYukina commented Jun 24, 2025

Why is this change needed?

Fixed a failing test case in workflow.test.ts:283 and implemented comprehensive retry handling across all workflow nodes to ensure robust error management.

  • Test case should handle schema update failure was failing due to LangGraph channel update conflicts
  • Workflow nodes had inconsistent retry handling patterns
  • Infinite loops occurred due to node name mismatches between workflow definitions and node constants

1. Fixed LangGraph Channel Update Issues

  • Added custom reducers to all LangGraph annotations to prevent LastValue can only receive one value per step errors
  • Properly handles state updates during node retries without channel conflicts

2. Implemented Comprehensive Retry Handling

  • Added consistent try-catch blocks and retry count management to all 9 workflow nodes
  • Ensures proper error propagation and state management across the entire workflow
  • Pattern includes: retry count tracking, error clearing on success, and proper error message handling

3. Resolved Node Name Mismatches

  • Fixed inconsistency between NODE_NAME constants in node files and workflow edge definitions
  • Removed "Node" suffix from all node constants to match workflow naming convention
  • Eliminated infinite retry loops caused by retry logic unable to find correct retry counts

What would you like reviewers to focus on?

Testing Verification

✅ All 13 test cases now pass
✅ Confirmation that workflow can be executed from chat
https://cloud.trigger.dev/orgs/liam-hq-5035/projects/liam-HdAt/env/preview-fix-test-error/runs/run_cmca3txthbq3727mxmhsn26j4?span=e5ccd84e09799319

What was done

🤖 Generated by PR Agent at 7e7fd8b

  • Fix failing workflow test and LangGraph channel conflicts
  • Implement comprehensive retry handling across all workflow nodes
  • Standardize node naming and resolve workflow edge mismatches
  • Enhance error handling with proper state management

Detailed Changes

Relevant files
Configuration changes
2 files
analyzeRequirementsNode.ts
Rename NODE_NAME constant for consistency                               
+1/-1     
generateUsecaseNode.ts
Rename NODE_NAME constant for consistency                               
+1/-1     
Error handling
7 files
designSchemaNode.ts
Add retry handling and rename NODE_NAME                                   
+37/-13 
executeDDLNode.ts
Implement comprehensive error handling and retry logic     
+29/-2   
finalizeArtifactsNode.ts
Add retry handling and rename NODE_NAME                                   
+67/-51 
generateDDLNode.ts
Enhance error handling and rename NODE_NAME                           
+14/-5   
prepareDMLNode.ts
Implement comprehensive error handling and retry logic     
+29/-2   
reviewDeliverablesNode.ts
Add retry handling and rename NODE_NAME                                   
+25/-8   
validateSchemaNode.ts
Add retry handling and rename NODE_NAME                                   
+25/-8   
Enhancement
1 files
workflow.ts
Implement unified retry logic with getNextNodeOrEnd           
+36/-17 
Bug fix
1 files
langGraphUtils.ts
Fix LangGraph annotations and increase recursion limit     
+65/-18 
Tests
1 files
workflow.test.ts
Re-enable failing test and update error assertions             
+13/-11 

Additional Notes


Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • - Re-enabled the previously skipped test for handling schema update failure and adjusted the recursion limit from 20 to 40 to accommodate workflow execution.
    - Enhanced error handling in tests to match against multiple error scenarios, ensuring robustness in the chat workflow.
    - Improved annotations in `langGraphUtils.ts` to include default values and reducers for better state management.
    …handling in workflow nodes
    
    - Renamed NODE_NAME constants across multiple workflow nodes for consistency, changing from `analyzeRequirementsNode` to `analyzeRequirements`, `designSchemaNode` to `designSchema`, and others similarly.
    - Improved error handling in `designSchemaNode`, `finalizeArtifactsNode`, `executeDDLNode`, `generateDDLNode`, `generateUsecaseNode`, `prepareDMLNode`, `reviewDeliverablesNode`, and `validateSchemaNode` to include retry logic and clearer logging.
    - Updated the workflow service to utilize a new function `getNextNodeOrEnd` for determining the next node based on error state and retry count, enhancing the flow control of the chat workflow.
    @FunamaYukina FunamaYukina self-assigned this Jun 24, 2025
    Copy link

    changeset-bot bot commented Jun 24, 2025

    ⚠️ No Changeset found

    Latest commit: 622473d

    Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

    This PR includes no changesets

    When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

    Click here to learn what changesets are, and how to add one.

    Click here if you're a maintainer who wants to add a changeset to this PR

    Copy link

    vercel bot commented Jun 24, 2025

    The latest updates on your projects. Learn more about Vercel for Git ↗︎

    Name Status Preview Comments Updated (UTC)
    liam-app ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 24, 2025 6:48am
    liam-storybook ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 24, 2025 6:48am
    2 Skipped Deployments
    Name Status Preview Comments Updated (UTC)
    liam-docs ⬜️ Ignored (Inspect) Visit Preview Jun 24, 2025 6:48am
    liam-erd-sample ⬜️ Skipped (Inspect) Jun 24, 2025 6:48am

    Base automatically changed from implement-generate-usecase-node to main June 24, 2025 03:46
    Copy link

    supabase bot commented Jun 24, 2025

    Updates to Preview Branch (fix-test-error) ↗︎

    Deployments Status Updated
    Database Tue, 24 Jun 2025 06:45:06 UTC
    Services Tue, 24 Jun 2025 06:45:06 UTC
    APIs Tue, 24 Jun 2025 06:45:06 UTC

    Tasks are run on every commit but only new migration files are pushed.
    Close and reopen this PR if you want to apply changes from existing seed or migration files.

    Tasks Status Updated
    Configurations Tue, 24 Jun 2025 06:45:07 UTC
    Migrations Tue, 24 Jun 2025 06:45:07 UTC
    Seeding Tue, 24 Jun 2025 06:45:07 UTC
    Edge Functions Tue, 24 Jun 2025 06:45:07 UTC

    View logs for this Workflow Run ↗︎.
    Learn more about Supabase for Git ↗︎.

    Comment on lines +27 to +30
    userInput: Annotation<string>({
    reducer: (_, newValue: string) => newValue,
    default: () => '',
    }),
    Copy link
    Member Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Here's what happened:

    LangGraph's default LastValue annotation only allows one write per channel per workflow step. However, when our nodes retry due to errors, they spread the entire state (...state), which includes all existing channel values.

    This caused LangGraph to think we were writing to channels like userInput multiple times in the same step, triggering "Invalid update for channel" errors.

    Adding custom reducers that explicitly handle duplicate values (by simply returning the new value) tells LangGraph how to resolve these "conflicts" and allows retries to work properly.

    about reducer
    https://langchain-ai.github.io/langgraph/concepts/low_level/#default-reducer

    @FunamaYukina FunamaYukina marked this pull request as ready for review June 24, 2025 06:31
    @FunamaYukina FunamaYukina requested a review from a team as a code owner June 24, 2025 06:31
    @FunamaYukina FunamaYukina requested review from hoshinotsuyoshi, junkisai, MH4GF and NoritakaIkeda and removed request for a team June 24, 2025 06:31
    Copy link
    Contributor

    qodo-merge-for-open-source bot commented Jun 24, 2025

    PR Reviewer Guide 🔍

    (Review updated until commit 622473d)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Breaking Change

    The LangGraph annotations have been completely restructured with custom reducers. This is a significant breaking change that affects all workflow state management. The reducer functions may not handle edge cases properly and could cause state corruption during concurrent updates.

    return Annotation.Root({
      userInput: Annotation<string>({
        reducer: (_, newValue: string) => newValue,
        default: () => '',
      }),
      analyzedRequirements: Annotation<
        | {
            businessRequirement: string
            functionalRequirements: Record<string, string[]>
            nonFunctionalRequirements: Record<string, string[]>
          }
        | undefined
      >({ reducer: (_, newValue) => newValue }),
      generatedUsecases: Annotation<Usecase[] | undefined>({
        reducer: (_, newValue) => newValue,
      }),
      generatedAnswer: Annotation<string | undefined>({
        reducer: (_, newValue) => newValue,
      }),
      finalResponse: Annotation<string | undefined>({
        reducer: (_, newValue) => newValue,
      }),
      formattedHistory: Annotation<string>({
        reducer: (_, newValue) => newValue,
        default: () => '',
      }),
      schemaData: Annotation<Schema>({ reducer: (_, newValue) => newValue }),
      projectId: Annotation<string | undefined>({
        reducer: (_, newValue) => newValue,
      }),
      buildingSchemaId: Annotation<string>({
        reducer: (_, newValue) => newValue,
        default: () => '',
      }),
      latestVersionNumber: Annotation<number>({
        reducer: (_, newValue) => newValue,
      }),
      organizationId: Annotation<string | undefined>({
        reducer: (_, newValue) => newValue,
      }),
      userId: Annotation<string>({
        reducer: (_, newValue) => newValue,
        default: () => '',
      }),
      designSessionId: Annotation<string>({
        reducer: (_, newValue) => newValue,
        default: () => '',
      }),
      error: Annotation<string | undefined>({
        reducer: (_, newValue) => newValue,
      }),
      retryCount: Annotation<Record<string, number>>({
        reducer: (existing, newValue) => ({ ...existing, ...newValue }),
        default: () => ({}),
      }),
    
      // Repository dependencies for data access
      repositories: Annotation<Repositories>({
        reducer: (_, newValue) => newValue,
      }),
    
      // Logging functionality
      logger: Annotation<NodeLogger>({ reducer: (_, newValue) => newValue }),
    })
    Error Handling

    Database save failures now throw errors instead of continuing gracefully. This could cause workflow interruption for non-critical operations like timeline item saving, where the previous behavior of logging and continuing might be more appropriate.

    if (!saveResult.success) {
      throw new Error(`Failed to save error message: ${saveResult.error}`)
    }
    Logic Error

    The retry logic has been simplified but may not handle all edge cases correctly. The function returns the same node name for retry but doesn't account for scenarios where the error state might be cleared between retries, potentially causing infinite loops.

    const getNextNodeOrEnd = (
      state: WorkflowState,
      nodeName: string,
      nextNode: string,
      maxRetries = 3,
    ): string => {
      const retryCount = state.retryCount[nodeName] ?? 0
    
      // If there's an error and retry count hasn't exceeded max, retry the same node
      if (state.error && retryCount < maxRetries) {
        return nodeName
      }
    
      // If retry is exhausted but there's still an error, go to END
      if (state.error) {
        return END
      }
    
      // Normal flow: proceed to next node
      return nextNode
    }

    Copy link
    Contributor

    qodo-merge-for-open-source bot commented Jun 24, 2025

    PR Code Suggestions ✨

    Latest suggestions up to 622473d
    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    General
    Preserve validation-specific retry logic

    The original logic for validateSchema had specific business logic to retry from
    designSchema on validation failures. The new generic retry logic may not handle
    validation-specific error recovery correctly.

    frontend/internal-packages/agent/src/chat/workflow/services/workflow.ts [91-93]

     .addConditionalEdges('validateSchema', (state) => {
    -  return getNextNodeOrEnd(state, 'validateSchema', 'reviewDeliverables')
    +  // If validation fails, retry from design phase
    +  if (state.error) {
    +    const retryCount = state.retryCount['validateSchema'] ?? 0
    +    return retryCount < 3 ? 'validateSchema' : 'designSchema'
    +  }
    +  return 'reviewDeliverables'
     })
    • Apply / Chat
    Suggestion importance[1-10]: 9

    __

    Why: This suggestion correctly points out that the refactoring to a generic retry helper removed crucial, domain-specific error handling. The old logic correctly sent the workflow back to the designSchema node on a validation failure. The new generic logic would just retry the validation and then fail, which is a significant regression in functionality.

    High
    Possible issue
    Prevent infinite retry loops

    Throwing errors for database save failures can cause infinite retry loops since
    the node will retry on any error. Consider logging the error and continuing with
    processing instead of throwing.

    frontend/internal-packages/agent/src/chat/workflow/nodes/finalizeArtifactsNode.ts [32-34]

     if (!saveResult.success) {
    -  throw new Error(`Failed to save error message: ${saveResult.error}`)
    +  console.error('Failed to save error message:', saveResult.error)
    +  // Continue processing even if message saving fails
     }
    • Apply / Chat
    Suggestion importance[1-10]: 7

    __

    Why: The suggestion correctly identifies that throwing an error on a database save failure will trigger the new retry mechanism. It proposes reverting to the previous behavior of logging the error and continuing. This is a valid design consideration, as failing to save a timeline item might not be critical enough to warrant retrying the entire node.

    Medium
    • More

    Previous suggestions

    Suggestions up to commit 7e7fd8b
    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix validation retry logic

    The validateSchema node should have special logic to retry from designSchema on
    validation failures, not just retry itself. This maintains the original workflow
    behavior where validation errors trigger schema redesign.

    frontend/internal-packages/agent/src/chat/workflow/services/workflow.ts [91-93]

     .addConditionalEdges('validateSchema', (state) => {
    -  return getNextNodeOrEnd(state, 'validateSchema', 'reviewDeliverables')
    +  const retryCount = state.retryCount['validateSchema'] ?? 0
    +  
    +  // If there's an error and retry count hasn't exceeded max, go back to designSchema
    +  if (state.error && retryCount < 3) {
    +    return 'designSchema'
    +  }
    +  
    +  // If retry is exhausted but there's still an error, go to END
    +  if (state.error) {
    +    return END
    +  }
    +  
    +  // Normal flow: proceed to next node
    +  return 'reviewDeliverables'
     })
    Suggestion importance[1-10]: 9

    __

    Why: This suggestion correctly identifies a significant logic regression. The PR's refactoring to a generic retry mechanism removed the specific, crucial workflow path where a validateSchema failure would loop back to designSchema. Restoring this behavior is critical for the workflow's correctness.

    High
    Fix review retry logic

    The reviewDeliverables node should retry from analyzeRequirements on review
    failures, not just retry itself. This preserves the original workflow behavior
    where review issues trigger requirement re-analysis.

    frontend/internal-packages/agent/src/chat/workflow/services/workflow.ts [94-96]

     .addConditionalEdges('reviewDeliverables', (state) => {
    -  return getNextNodeOrEnd(state, 'reviewDeliverables', 'finalizeArtifacts')
    +  const retryCount = state.retryCount['reviewDeliverables'] ?? 0
    +  
    +  // If there's an error and retry count hasn't exceeded max, go back to analyzeRequirements
    +  if (state.error && retryCount < 3) {
    +    return 'analyzeRequirements'
    +  }
    +  
    +  // If retry is exhausted but there's still an error, go to END
    +  if (state.error) {
    +    return END
    +  }
    +  
    +  // Normal flow: proceed to next node
    +  return 'finalizeArtifacts'
     })
    Suggestion importance[1-10]: 9

    __

    Why: Similar to the previous suggestion, this correctly points out a logic regression. The original workflow correctly sent the process back to analyzeRequirements if the reviewDeliverables step failed. The PR's generic retry logic broke this. The suggestion correctly proposes to restore this essential workflow loop.

    High
    General
    Clear stale DDL data

    When DDL generation fails, the ddlStatements field should be cleared or set to
    undefined to avoid using stale data from previous attempts.

    frontend/internal-packages/agent/src/chat/workflow/nodes/generateDDLNode.ts [66-73]

     return {
       ...state,
       error: errorMessage,
    +  ddlStatements: undefined,
       retryCount: {
         ...state.retryCount,
         [NODE_NAME]: retryCount + 1,
       },
     }
    Suggestion importance[1-10]: 7

    __

    Why: The suggestion correctly identifies that on failure, ddlStatements could contain stale data from a previous successful run within the same session. Clearing it by setting it to undefined in the error handling block is a good practice to prevent subsequent nodes from accidentally using incorrect data.

    Medium
    Learned
    best practice
    Add input parameter validation

    Add input validation at the beginning of the function to check that required
    parameters are properly defined. This prevents runtime errors when accessing
    properties of potentially undefined or null values.

    frontend/internal-packages/agent/src/chat/workflow/nodes/designSchemaNode.ts [17-25]

     const applySchemaChanges = async (
       schemaChanges: BuildAgentResponse['schemaChanges'],
       buildingSchemaId: string,
       latestVersionNumber: number,
       message: string,
       state: WorkflowState,
       retryCount: number,
     ): Promise<WorkflowState> => {
    +  if (!schemaChanges || !buildingSchemaId || !state?.repositories?.schema) {
    +    throw new Error('Invalid parameters: missing required data for schema changes');
    +  }
    +  
       const result = await state.repositories.schema.createVersion({
         buildingSchemaId,
         latestVersionNumber,
         ...
    Suggestion importance[1-10]: 6

    __

    Why:
    Relevant best practice - Add explicit validation for input parameters at the beginning of functions to ensure they are properly defined and in the expected format before processing. This prevents runtime errors from null, undefined, or malformed data.

    Low

    @FunamaYukina FunamaYukina requested a review from Copilot June 24, 2025 06:33
    Copy link

    @Copilot Copilot AI left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Pull Request Overview

    This pull request fixes a failing workflow test and implements comprehensive retry handling across workflow nodes to improve error management and prevent infinite retry loops.

    • Reactivates a skipped test and increases the recursion limit.
    • Standardizes node naming and error handling across multiple workflow nodes.
    • Updates retry logic and conditional edge transitions in the workflow service.

    Reviewed Changes

    Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

    File Description
    workflow.test.ts Reactivated the failing test, updated recursion limit, and modified error assertions to use regex matching.
    langGraphUtils.ts Adjusted the default recursion limit and enhanced annotation definitions.
    services/workflow.ts Refactored retry logic into getNextNodeOrEnd to direct flow based on error state and retry count.
    validateSchemaNode.ts, reviewDeliverablesNode.ts, prepareDMLNode.ts, generateDDLNode.ts, finalizeArtifactsNode.ts, executeDDLNode.ts, designSchemaNode.ts, analyzeRequirementsNode.ts Standardized NODE_NAME constants and integrated consistent try-catch and retry count handling across all nodes.
    Comments suppressed due to low confidence (2)

    frontend/internal-packages/agent/src/chat/workflow/nodes/finalizeArtifactsNode.ts:34

    • Review the decision to throw an error on save failures and ensure this behavior aligns with the overall workflow error handling strategy for consistent failure management.
            throw new Error(`Failed to save error message: ${saveResult.error}`)
    

    frontend/internal-packages/agent/src/chat/workflow/services/workflow.ts:38

    • Verify that the constant 'END' is defined or imported within this module to prevent potential reference errors during workflow execution.
        return END
    

    Comment on lines 31 to 41
    // Increment retry count and set error
    return {
    ...state,
    error: errorMessage,
    retryCount: {
    ...state.retryCount,
    [NODE_NAME]: retryCount + 1,
    },
    }
    }
    }
    Copy link
    Preview

    Copilot AI Jun 24, 2025

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Consider extracting the retry count increment logic into a shared helper to reduce repetition and improve maintainability across workflow nodes.

    Suggested change
    // Increment retry count and set error
    return {
    ...state,
    error: errorMessage,
    retryCount: {
    ...state.retryCount,
    [NODE_NAME]: retryCount + 1,
    },
    }
    }
    }
    // Increment retry count and set error using helper
    return incrementRetryCount(state, NODE_NAME, errorMessage)
    }
    }
    /**
    * Helper function to increment retry count and set error in state.
    * @param state - The current workflow state.
    * @param nodeName - The name of the node.
    * @param errorMessage - The error message to set.
    * @returns Updated workflow state with incremented retry count and error.
    */
    function incrementRetryCount(
    state: WorkflowState,
    nodeName: string,
    errorMessage: string,
    ): WorkflowState {
    const retryCount = state.retryCount[nodeName] ?? 0
    return {
    ...state,
    error: errorMessage,
    retryCount: {
    ...state.retryCount,
    [nodeName]: retryCount + 1,
    },
    }
    }

    Copilot uses AI. Check for mistakes.

    Copy link
    Member Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Thank you I will fix it.

    Copy link
    Member Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    @FunamaYukina FunamaYukina marked this pull request as draft June 24, 2025 06:35
    - Introduced a new utility function `incrementRetryCount` to streamline error handling and retry count management in multiple workflow nodes.
    - Removed redundant retry count logic from `analyzeRequirementsNode`, `designSchemaNode`, `executeDDLNode`, `finalizeArtifactsNode`, `generateDDLNode`, `generateUsecaseNode`, `prepareDMLNode`, `reviewDeliverablesNode`, and `validateSchemaNode`, replacing it with the new utility function for improved code clarity and maintainability.
    - This refactor enhances the error handling mechanism across the workflow, ensuring consistent behavior when errors occur.
    
    Co-authored-by: [Your Name] <[email protected]>
    Copy link
    Member

    @MH4GF MH4GF left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    There were a few things I was concerned about, but I don't think it will be a problem to merge them.Thank you!

    Comment on lines +71 to +72
    // Conditional edges with retry logic - each node will retry up to maxRetries times on error
    // If maxRetries is exceeded and error persists, workflow goes to END
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    https://langchain-ai.github.io/langgraphjs/how-tos/node-retry-policies/

    I found it a bit hard to read, so I checked if LangChain has a built-in retry system, and it seems that you can set a retryPolicy on addNode. It may be better to match this one.

    Migration could possibly solve the following problems: https://github.com/liam-hq/liam/pull/2167/files#r2163037269

    There is no problem to merge once to solve this issue!

    Copy link
    Member Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Oh wow, thank you! Let me think for a bit if I should recreate the pull request.🤔

    @FunamaYukina FunamaYukina marked this pull request as draft June 24, 2025 09:58
    @FunamaYukina
    Copy link
    Member Author

    I will recreate the pull request.🙏

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants