Skip to content
Back to full roadmap
topicadvanced

Anthropic Computer Use

Claude sees the screen, controls mouse/keyboard. GUI automation revolution.

4 hours2 resources1 prereqs

Anthropic Computer Use (Oct 2024) — gives Claude 4 tools: screenshot, click, type, key. The model at each step:

  1. Take screenshot
  2. Analyze the UI (vision)
  3. Pick the next action
  4. Execute the action
  5. Screenshot again — continue

Use cases: legacy software automation, form filling, screen scraping (no API), end-to-end testing, accessibility.

Limitations: slow (screenshot + vision per step), miss-click risk, anti-bot mechanisms (CAPTCHA). Needs sandbox VM in production.

Cost: vision tokens are expensive; ~1000-2000 tokens per screenshot.

Prerequisites

Resources(2)