11 Feb 04:25

c55630a

0.20.0

Fixed Focus Logic

The previous focus logic contained an issue where movement would stop prematurely even when failing to reach the target, causing the AI to become confused. This has been resolved through improved loop termination conditions.

Enhanced Multi-Image Assertion

We addressed challenges in verifying video playback status by implementing a multi-image AI assertion system. This allows sequential image comparisons to validate dynamic content states.

What's Changed

Fix focus by index by @takahirom in #129
Add multiple image assertion by @takahirom in #130

Full Changelog: 0.19.0...0.20.0

Contributors

takahirom

Assets 11

05 Feb 08:53

takahirom

0.19.0

9df7e9a

0.19.0

Experiment to Optimize System Prompt

We will store JSONL files in arbigent-result/ containing requestBody and responseBody data. User feedback from the interface can also be recorded in arbigent-result/ to enable AI-driven optimization.

The current system prompt was developed through trial and error. We plan to implement the COPRO optimization method using collected request-response pairs as training data. While not yet implemented, we have sample code for prompt optimization.

Failed Cache Removal

We encountered recurring failures due to preserved AI decision caches after unsuccessful tests. This was resolved by automatically removing corresponding cache entries when tests fail.

What's Changed

Add experimental methods for maintain ssot by @takahirom in #123
Remove failed cache by @takahirom in #127
Save API call jsonl and add feedback feature by @takahirom in #124

Full Changelog: 0.18.0...0.19.0

Contributors

takahirom

Assets 11

29 Jan 05:46

takahirom

0.18.0

dc159a8

0.18.0

New Feature

You can add notes for other team members and rename scenario IDs to make the YAML more readable.

What's Changed

Add note for humans by @takahirom in #121
Make scenario id changeable by @takahirom in #122

Full Changelog: 0.17.0...0.18.0

Contributors

takahirom

Assets 11

28 Jan 10:06

takahirom

0.17.0

77ed913

0.17.0

Fix for Windows Compatibility Issues

Though I don't have a Windows environment for testing, the issue might be resolved now. Thank you for reporting this, @anunay1!
If you're still experiencing issues, please let me know.
#118

Bugfix

Fixed erroneous image assertion executions even when not explicitly set. These assertions were being triggered upon Agent task completion, resulting in redundant executions. We've addressed this behavior and implemented test cases to verify the fix.

What's Changed

Fix issue where image assertion runs even if image assertions are not set by @takahirom in #116
Improve auto focus by @takahirom in #117
Fix windows adb connection by @takahirom in #119
Add perUserInstall for Windows by @takahirom in #120

Full Changelog: 0.16.0...0.17.0

Contributors

takahirom and anunay1

Assets 11

27 Jan 07:39

takahirom

0.16.0

90a29ec

0.16.0

New Feature: AI Decision-Making Cache

When the ViewTree structure and goal (prompt) are identical, the system can now utilize a cached memory of AI decisions. This addresses performance bottlenecks since AI decision-making is typically the most resource-intensive and time-consuming component. This is an experimental feature and is disabled by default.

- Launch app  // ← This task can be cached when executing the "Open Member Page" scenario  
  - Open search  
  - Open member page

Important bug fix

In version 0.15.0, the API key could be written to the log file unintentionally.
We've introduced a mechanism to replace the API key with placeholders and added a CI check to prevent potential API key leaks.

What's Changed

Refactor: Extract detectStuckScreen by @takahirom in #112
Refactor: Make step suspend by @takahirom in #113
Refactor: Make execute interceptable by @takahirom in #114
Add AI decision cache by @takahirom in #115

Full Changelog: 0.15.0...0.16.0

Contributors

takahirom

Assets 11

26 Jan 08:15

takahirom

0.15.0

f4fb567

[Deprecated] 0.15.0

In version 0.15.0, log files may contain an API key. We are actively addressing this issue.

UI Updates

We have several updates to the UI:

Added launch app arguments
Added a console for debugging

The scenario.cleanupData parameter in YAML is no longer used (though it currently does not cause errors). You can use the initialization methods instead.

CLI Updates

You can now use --scenario-id=foo to filter scenarios. You can also specify multiple scenarios using --scenario-id=foo,bar or --scenario-id=foo --scenario-id=bar.
Use --dry-run to preview which scenarios will run. This is particularly useful with the --shard option, as it might otherwise be difficult to determine target scenarios. (This is also used in Arbigent tests)

What's Changed

Refactor initialization methods and add wait option and arguments for… by @takahirom in #106
Refactor console by @takahirom in #107
Add instructions for installing apps from unidentified developers by @takahirom in #108
Add scenario-id filter and dry-run by @takahirom in #109
Add test for initialization methods by @takahirom in #110
Remove confusing cleanup option by @takahirom in #111

Full Changelog: 0.14.0...0.15.0

Contributors

takahirom

Assets 11

24 Jan 02:39

takahirom

0.14.0

2912492

0.14.0

Fix critical issues in UI

We encountered issues with Arbigent where device connectivity and API key input occasionally failed to save properly. These issues have now been resolved. Please test the updated version.

What's Changed

Connect device only once on UI by @takahirom in #102
Fix API key input bug by @takahirom in #103

Full Changelog: 0.13.0...0.14.0

Contributors

takahirom

Assets 11

23 Jan 04:44

takahirom

0.13.0

c23024b

0.13.0

Add shard option to enable parallel tests

You can run tests separately with the --shard option.

arbigent --shard=1/4

  cli-e2e-android:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shardIndex: [ 1, 2, 3, 4 ]
        shardTotal: [ 4 ]
    steps:
...
      - name: CLI E2E test
        uses: reactivecircus/android-emulator-runner@v2
...
          script: |
            arbigent --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }} --os=android --project-file=sample-test/src/main/resources/projects/e2e-test-android.yaml --ai-type=gemini --gemini-model-name=gemini-2.0-flash-exp
...

      - uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4
        if: ${{ always() }}
        with:
          name: cli-report-android-${{ matrix.shardIndex }}-${{ matrix.shardTotal }}
          path: |
            arbigent-result/*
          retention-days: 90

What's Changed

Remove UI tree from the result. Because UI tree is too big by @takahirom in #100
Add shard option to enable parallel execution by @takahirom in #99
Add shard to README by @takahirom in #101

Full Changelog: 0.12.0...0.13.0

Contributors

takahirom

Assets 11

22 Jan 10:01

takahirom

0.12.0

11b7887

0.12.0

You can now see the Arbigent running status at the bottom of the screen.

What's Changed

[Doc/Sample] Add now in android sample prompts by @takahirom in #95
[UI] Add GlobalStatus for UI by @takahirom in #96
[UI] Show device connecting message in UI by @takahirom in #97
[Doc] Update yaml file by @takahirom in #98

Full Changelog: 0.11.0...0.12.0

Contributors

takahirom

Assets 11

19 Jan 07:34

takahirom

0.11.0

2fe2c1b

0.11.0

New Feature: screen stuck detection

Identifies and recovers from situations where the AI agent gets stuck on the same screen, prompting it to reconsider its actions.

What's Changed

[Web] Fix Web issue where Web can't find element that clicks by @takahirom in #89
[UI] Improve window close logic by @takahirom in #90
[Refactor] Refactor destructing by @takahirom in #91
[New Feature] Add screen stuck detection by @takahirom in #92
[Docs] Add SMURF and Stuck Screen Detection to README by @takahirom in #93
[Docs] Refactor features of README by @takahirom in #94

Full Changelog: 0.10.1...0.11.0

Contributors

takahirom

Assets 11

Releases: takahirom/arbigent

0.20.0

Fixed Focus Logic

Enhanced Multi-Image Assertion

What's Changed

Contributors

Uh oh!

0.19.0

Experiment to Optimize System Prompt

Failed Cache Removal

What's Changed

Contributors

Uh oh!

0.18.0

New Feature

What's Changed

Contributors

Uh oh!

0.17.0

Fix for Windows Compatibility Issues

Bugfix

What's Changed

Contributors

Uh oh!

0.16.0

New Feature: AI Decision-Making Cache

Important bug fix

What's Changed

Contributors

Uh oh!

[Deprecated] 0.15.0

UI Updates

CLI Updates

What's Changed

Contributors

Uh oh!

0.14.0

Fix critical issues in UI

What's Changed

Contributors

Uh oh!

0.13.0

Add shard option to enable parallel tests

What's Changed

Contributors

Uh oh!

0.12.0

What's Changed

Contributors

Uh oh!

0.11.0

New Feature: screen stuck detection

What's Changed

Contributors

Uh oh!