-
Notifications
You must be signed in to change notification settings - Fork 2
feat: upgrade agent with playwright tool #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… computer tool with latest functionality - Add new base types for better type safety - Add Playwright tool integration - Update validator and collection utilities - Enhance loop functionality
|
@matthewjmarangoni said: If the common query case is asking for the keyboard shortcut directly with intent to navigate, relevant keyboard shortcuts could be replaced via the system prompt. A potential tradeoff is the keyboard shortcuts might not be able to be used at all or reliably. I did a little exploring down that line of thought that may prove fruitful.
It seems possible a more nuanced system instruction could be crafted. Perhaps functionally akin to:
The following is only mentioned due to the error encountered. It used the same system prompt modification.
|
|
@bigboateng @matthewjmarangoni - thanks for the interesting discussion. Perhaps some framing we could use to make a decision:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Changed
This pull request significantly enhances the agent's capabilities by introducing a PlaywrightTool, which provides direct, programmatic access to browser functions. The initial implementation includes a goto method, allowing the agent to navigate to URLs in a single, efficient step.
To support this new functionality, a major refactoring was performed:
- A new, unified type system for tools was created in
tools/types/base.ts, establishing a commonComputerUseToolinterface. - The
ToolCollectionwas updated to manage and validate different tool types, specifically handling the newPlaywrightTool. - The agent's
SYSTEM_PROMPTinloop.tswas updated to guide the model in using the newgototool, enabling it to handle more abstract requests like navigating to "hackernews" without a specific URL.
This change moves the agent from relying solely on simulated user input to a more powerful and efficient hybrid approach.
Risks / Concerns
This is a well-structured and powerful enhancement to the agent. The refactoring of the tool system into a more generic and type-safe architecture is a great improvement for future extensibility. Excellent work!
8 files reviewed | 0 comments | Review on Mesa | Edit Reviewer Settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performed full review of b81e3bb...47d6a0f
8 files reviewed | 0 comments | Review on Mesa | Edit Reviewer Settings
This pr is ported from https://github.com/onkernel/create-kernel-app/pull/29
The general idea is to provide playwright functionality to the agent for direct access to methods like
gotofor url changes etc due to limitations of the screenshot + keyboard combination approach.Providing playwright access through tool calling is also generally a great idea as we can provide direct access to functions without waiting for the agent to figure it out. For example it will take agent 3 tool calls to figure out how to change url vs 1 step with the tool calling.
This pr also provides an entry point for ambiguous requests through google with system prompt upgrade.
Test with below code where we no longer need to directly navigate to a url first