Back to full roadmap
topicadvanced
Anthropic Computer Use
Claude sees the screen, controls mouse/keyboard. GUI automation revolution.
4 hours2 resources1 prereqs
Anthropic Computer Use (Oct 2024) — gives Claude 4 tools: screenshot, click, type, key. The model at each step:
- Take screenshot
- Analyze the UI (vision)
- Pick the next action
- Execute the action
- Screenshot again — continue
Use cases: legacy software automation, form filling, screen scraping (no API), end-to-end testing, accessibility.
Limitations: slow (screenshot + vision per step), miss-click risk, anti-bot mechanisms (CAPTCHA). Needs sandbox VM in production.
Cost: vision tokens are expensive; ~1000-2000 tokens per screenshot.