Salesforce researchers developed a new technique that automates AI agents.
Salesforce has developed the CoAct-1 technique and described it in a paper. The technique allows AI agents to execute code while navigating through interfaces. This means they can move the cursor and click with it. This would enable AI agents to work faster and make fewer mistakes.
Combining Three Agents
CoAct-1 consists of three components: an Orchestrator agent that distributes work in the back-end, a graphical user interface (GUI)-operator agent that navigates screens in the front-end, and a Programmer agent that writes code in Python or Bash. The system decides whether a task is better performed through clicking or coding. CoAct-1 solves tasks in an average of ten steps, a significant improvement compared to the fifteen steps required by agents that don’t use the technique.

Source: CoAct-1 Paper
On the OS-World benchmark with 369 realistic computing tasks, CoAct-1 achieved a success rate of over 60 percent, setting a new record. Particularly complex tasks, such as filtering files or compressing folders, were handled more efficiently and with fewer errors.
Challenges Remain Significant
Although the technique scores well in benchmarks, business environments are often messier or work with unclear interfaces from legacy software. For now, human supervision is still needed to guide the work of AI agents in the right direction.
That’s why Salesforce sets a goal: “A system where the agent can observe how humans work, is further trained in a sandbox environment, and receives continuous guidance and protection after going live.” For now, this vision remains fantasy, and it will likely take years to become reality.