Skip to content

Releases: takahirom/arbigent

0.20.0

11 Feb 04:25
c55630a
Compare
Choose a tag to compare

Fixed Focus Logic

The previous focus logic contained an issue where movement would stop prematurely even when failing to reach the target, causing the AI to become confused. This has been resolved through improved loop termination conditions.

Enhanced Multi-Image Assertion

We addressed challenges in verifying video playback status by implementing a multi-image AI assertion system. This allows sequential image comparisons to validate dynamic content states.
image

What's Changed

Full Changelog: 0.19.0...0.20.0

0.19.0

05 Feb 08:53
9df7e9a
Compare
Choose a tag to compare

Experiment to Optimize System Prompt

We will store JSONL files in arbigent-result/ containing requestBody and responseBody data. User feedback from the interface can also be recorded in arbigent-result/ to enable AI-driven optimization.

system-prompt-optimization-interface

The current system prompt was developed through trial and error. We plan to implement the COPRO optimization method using collected request-response pairs as training data. While not yet implemented, we have sample code for prompt optimization.

Failed Cache Removal

We encountered recurring failures due to preserved AI decision caches after unsuccessful tests. This was resolved by automatically removing corresponding cache entries when tests fail.

What's Changed

Full Changelog: 0.18.0...0.19.0

0.18.0

29 Jan 05:46
dc159a8
Compare
Choose a tag to compare

New Feature

You can add notes for other team members and rename scenario IDs to make the YAML more readable.

image

What's Changed

Full Changelog: 0.17.0...0.18.0

0.17.0

28 Jan 10:06
77ed913
Compare
Choose a tag to compare

Fix for Windows Compatibility Issues

Though I don't have a Windows environment for testing, the issue might be resolved now. Thank you for reporting this, @anunay1!
If you're still experiencing issues, please let me know.
#118

Bugfix

Fixed erroneous image assertion executions even when not explicitly set. These assertions were being triggered upon Agent task completion, resulting in redundant executions. We've addressed this behavior and implemented test cases to verify the fix.

What's Changed

Full Changelog: 0.16.0...0.17.0

0.16.0

27 Jan 07:39
90a29ec
Compare
Choose a tag to compare

New Feature: AI Decision-Making Cache

When the ViewTree structure and goal (prompt) are identical, the system can now utilize a cached memory of AI decisions. This addresses performance bottlenecks since AI decision-making is typically the most resource-intensive and time-consuming component. This is an experimental feature and is disabled by default.

- Launch app  // ← This task can be cached when executing the "Open Member Page" scenario  
  - Open search  
  - Open member page  
image

Important bug fix

In version 0.15.0, the API key could be written to the log file unintentionally.
We've introduced a mechanism to replace the API key with placeholders and added a CI check to prevent potential API key leaks.

What's Changed

Full Changelog: 0.15.0...0.16.0

[Deprecated] 0.15.0

26 Jan 08:15
f4fb567
Compare
Choose a tag to compare

In version 0.15.0, log files may contain an API key. We are actively addressing this issue.

UI Updates

We have several updates to the UI:

  • Added launch app arguments
  • Added a console for debugging

The scenario.cleanupData parameter in YAML is no longer used (though it currently does not cause errors). You can use the initialization methods instead.

image

CLI Updates

You can now use --scenario-id=foo to filter scenarios. You can also specify multiple scenarios using --scenario-id=foo,bar or --scenario-id=foo --scenario-id=bar.
Use --dry-run to preview which scenarios will run. This is particularly useful with the --shard option, as it might otherwise be difficult to determine target scenarios. (This is also used in Arbigent tests)

What's Changed

Full Changelog: 0.14.0...0.15.0

0.14.0

24 Jan 02:39
2912492
Compare
Choose a tag to compare

Fix critical issues in UI

We encountered issues with Arbigent where device connectivity and API key input occasionally failed to save properly. These issues have now been resolved. Please test the updated version.

What's Changed

Full Changelog: 0.13.0...0.14.0

0.13.0

23 Jan 04:44
c23024b
Compare
Choose a tag to compare

Add shard option to enable parallel tests

You can run tests separately with the --shard option.

arbigent --shard=1/4

  cli-e2e-android:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shardIndex: [ 1, 2, 3, 4 ]
        shardTotal: [ 4 ]
    steps:
...
      - name: CLI E2E test
        uses: reactivecircus/android-emulator-runner@v2
...
          script: |
            arbigent --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }} --os=android --project-file=sample-test/src/main/resources/projects/e2e-test-android.yaml --ai-type=gemini --gemini-model-name=gemini-2.0-flash-exp
...

      - uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4
        if: ${{ always() }}
        with:
          name: cli-report-android-${{ matrix.shardIndex }}-${{ matrix.shardTotal }}
          path: |
            arbigent-result/*
          retention-days: 90

What's Changed

Full Changelog: 0.12.0...0.13.0

0.12.0

22 Jan 10:01
11b7887
Compare
Choose a tag to compare

You can now see the Arbigent running status at the bottom of the screen.

image

What's Changed

Full Changelog: 0.11.0...0.12.0

0.11.0

19 Jan 07:34
2fe2c1b
Compare
Choose a tag to compare

New Feature: screen stuck detection

Identifies and recovers from situations where the AI agent gets stuck on the same screen, prompting it to reconsider its actions.

What's Changed

Full Changelog: 0.10.1...0.11.0