
Peekaboo
STDIOLightning-fast macOS screenshot tool for AI agents with visual analysis capabilities
Lightning-fast macOS screenshot tool for AI agents with visual analysis capabilities
🎉 NEW in v3: Complete GUI automation framework with AI Agent! Click, type, scroll, and automate any macOS application using natural language. Plus comprehensive menu bar extraction without clicking! See the GUI Automation section and AI Agent section for details.
Peekaboo is a powerful macOS utility for capturing screenshots, analyzing them with AI vision models, and now automating GUI interactions. It works both as a standalone CLI tool (recommended) and as an MCP server for AI assistants like Claude Desktop and Cursor.
Perfect for:
Perfect for:
Peekaboo bridges the gap between visual content on your screen and AI understanding. It provides:
Peekaboo uses a modern service-based architecture:
All components share the same core services, ensuring consistent behavior and optimal performance. See Service API Reference for detailed documentation.
# Option 1: Homebrew (Recommended) brew tap steipete/tap brew install peekaboo # Option 2: Direct Download curl -L https://github.com/steipete/peekaboo/releases/latest/download/peekaboo-macos-universal.tar.gz | tar xz sudo mv peekaboo-macos-universal/peekaboo /usr/local/bin/ # Option 3: npm (includes MCP server) npm install -g @steipete/peekaboo-mcp # Option 4: Build from source git clone https://github.com/steipete/peekaboo.git cd peekaboo ./scripts/build-cli-standalone.sh --install
# Capture screenshots peekaboo image --app Safari --path screenshot.png peekaboo image --mode frontmost peekaboo image --mode screen --screen-index 0 # List applications, windows, and screens peekaboo list apps peekaboo list windows --app "Visual Studio Code" peekaboo list screens # List all displays with indices for --screen-index # Analyze images with AI (use image command with --analyze) peekaboo image --analyze "What error is shown?" --path screenshot.png peekaboo image --analyze "Find all buttons" --app Safari peekaboo see --analyze "Describe this UI" --app Chrome # GUI Automation (v3) peekaboo see --app Safari # Identify UI elements peekaboo see --mode screen # Capture all screens (multi-screen) peekaboo see --mode screen --screen-index 1 # Capture specific screen peekaboo click "Submit" # Click button by text peekaboo type "Hello world" # Type at current focus peekaboo type "Line 1\nLine 2" # Type with newline (escape sequences) peekaboo press return # Press Enter key peekaboo press tab --count 3 # Press Tab 3 times peekaboo scroll --direction down --amount 5 # Scroll down 5 ticks # AI Agent - Natural language automation peekaboo "Open Safari and search for weather" peekaboo agent "Fill out the contact form" --verbose peekaboo hotkey cmd,c # Press Cmd+C # AI Agent Automation (v3) 🤖 peekaboo "Open TextEdit and write Hello World" peekaboo agent "Take a screenshot of Safari and email it" peekaboo agent --verbose "Find all Finder windows and close them" # Window Management (v3) peekaboo window close --app Safari # Close Safari window peekaboo window minimize --app Finder # Minimize Finder window peekaboo window move --app TextEdit --x 100 --y 100 peekaboo window resize --app Terminal --width 800 --height 600 peekaboo window focus --app "Visual Studio Code" # Multi-Screen Support (v3) peekaboo window resize --app Safari --target-screen 1 # Move to screen 1 peekaboo window move --app Terminal --screen-preset next # Move to next screen peekaboo window resize --app Notes --preset left_half --target-screen 0 # Space (Virtual Desktop) Management peekaboo space list # List all Spaces peekaboo space switch --to 2 # Switch to Space 2 peekaboo space move-window --app Safari --to 3 # Move Safari to Space 3 # Menu Bar Interaction (v3) peekaboo menu list --app Calculator # List all menus and items peekaboo menu list-all # List menus for frontmost app peekaboo menu click --app Safari --item "New Window" peekaboo menu click --app TextEdit --path "Format > Font > Bold" peekaboo menu click-extra --title "WiFi" # Click system menu extras # Configure settings peekaboo config init # Create config file peekaboo config edit # Edit in your editor peekaboo config show --effective # Show current settings
All Peekaboo commands support the --verbose
or -v
flag for detailed logging:
# See what's happening under the hood peekaboo image --app Safari --verbose peekaboo see --app Terminal -v peekaboo click --on B1 --verbose # Verbose output includes: # - Application search details # - Window discovery information # - UI element detection progress # - Timing information # - Session management operations
Verbose logs are written to stderr with timestamps:
[2025-01-06T08:05:23Z] VERBOSE: Searching for application: Safari
[2025-01-06T08:05:23Z] VERBOSE: Found exact bundle ID match: Safari
[2025-01-06T08:05:23Z] VERBOSE: Capturing window for app: Safari
[2025-01-06T08:05:23Z] VERBOSE: Found 3 windows for application
This is invaluable for:
Peekaboo uses a unified configuration directory at ~/.peekaboo/
for better discoverability:
# Create default configuration peekaboo config init # Files created: # ~/.peekaboo/config.json - Main configuration (JSONC format) # ~/.peekaboo/credentials - API keys (chmod 600)
# Set API key securely (stored in ~/.peekaboo/credentials) peekaboo config set-credential OPENAI_API_KEY sk-... # View current configuration (keys shown as ***SET***) peekaboo config show --effective
~/.peekaboo/config.json
:
{ // AI Provider Settings "aiProviders": { "providers": "openai/gpt-4.1,anthropic/claude-opus-4,grok/grok-4,ollama/llava:latest", // NOTE: API keys should be in ~/.peekaboo/credentials "ollamaBaseUrl": "http://localhost:11434" }, // Default Settings "defaults": { "savePath": "~/Desktop/Screenshots", "imageFormat": "png", "captureMode": "window", "captureFocus": "auto" }, // Logging "logging": { "level": "info", "path": "~/.peekaboo/logs/peekaboo.log" } }
~/.peekaboo/credentials
(auto-created with proper permissions):
# Peekaboo credentials file
# This file contains sensitive API keys and should not be shared
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
X_AI_API_KEY=xai-...
# Capture and analyze in one command peekaboo image --app Safari --analyze "What's on this page?" --path /tmp/page.png # Monitor active window changes while true; do peekaboo image --mode frontmost --json-output | jq -r '.data.saved_files[0].window_title' sleep 5 done # Batch analyze screenshots for img in ~/Screenshots/*.png; do peekaboo image --analyze "Summarize this screenshot" --path "$img" done # Automated login workflow (v3 with automatic session resolution) peekaboo see --app MyApp # Creates new session peekaboo click --on T1 # Automatically uses session from 'see' peekaboo type "[email protected]" # Still using same session peekaboo press tab # Press Tab to move to next field peekaboo type "password123" peekaboo press return # Press Enter to submit peekaboo sleep 2000 # Wait 2 seconds # Multiple app automation with explicit sessions SESSION_A=$(peekaboo see --app Safari --json-output | jq -r '.data.session_id') SESSION_B=$(peekaboo see --app Notes --json-output | jq -r '.data.session_id') peekaboo click --on B1 --session $SESSION_A # Click in Safari peekaboo type "Hello" --session $SESSION_B # Type in Notes # Run automation script peekaboo run login.peekaboo.json
For AI assistants like Claude Desktop and Cursor, Peekaboo provides a Model Context Protocol (MCP) server.
{ "mcpServers": { "peekaboo": { "command": "npx", "args": ["-y", "@steipete/peekaboo-mcp@beta"], "env": { "PEEKABOO_AI_PROVIDERS": "anthropic/claude-opus-4,openai/gpt-4.1,ollama/llava:latest", "OPENAI_API_KEY": "your-openai-api-key-here" } } } }
Run the following command:
claude mcp add-json peekaboo '{ "type": "stdio", "command": "npx", "args": ["-y", "@steipete/peekaboo-mcp"], "env": { "PEEKABOO_AI_PROVIDERS": "anthropic/claude-opus-4,openai/gpt-4.1,ollama/llava:latest", "OPENAI_API_KEY": "your-openai-api-key-here" } }'
Alternatively, if you've already installed the server via Claude Desktop, you can import it:
claude mcp add-from-claude-desktop
For local development, use the built MCP server directly:
{ "mcpServers": { "peekaboo": { "command": "node", "args": ["/path/to/peekaboo/Server/dist/index.js"], "env": { "PEEKABOO_AI_PROVIDERS": "anthropic/claude-opus-4" } } } }
Add to your Cursor settings:
{ "mcpServers": { "peekaboo": { "command": "npx", "args": ["-y", "@steipete/peekaboo-mcp@beta"], "env": { "PEEKABOO_AI_PROVIDERS": "openai/gpt-4.1,ollama/llava:latest", "OPENAI_API_KEY": "your-openai-api-key-here" } } } }
Peekaboo v3 now functions as both an MCP server (exposing its tools) and an MCP client (consuming external tools). This enables powerful workflows that combine Peekaboo's native automation with tools from the broader MCP ecosystem.
Peekaboo ships with BrowserMCP enabled by default, providing browser automation capabilities via Puppeteer:
# BrowserMCP tools are available immediately peekaboo tools --mcp-only # List only external MCP tools peekaboo tools --mcp browser # Show BrowserMCP tools specifically peekaboo agent "Navigate to github.com and click the sign up button" # Uses browser:navigate and browser:click
# List all configured servers with health status peekaboo mcp list # Add popular MCP servers peekaboo mcp add github -e GITHUB_TOKEN=ghp_xxx -- npx -y @modelcontextprotocol/server-github peekaboo mcp add files -- npx -y @modelcontextprotocol/server-filesystem ~/Documents # Test server connection peekaboo mcp test github --show-tools # Enable/disable servers peekaboo mcp disable browser # Disable default BrowserMCP peekaboo mcp enable github # Re-enable a server
External servers are configured in ~/.peekaboo/config.json
. To disable BrowserMCP:
{ "mcpClients": { "browser": { "enabled": false } } }
All external tools are prefixed with their server name:
The AI agent automatically uses the best combination of native and external tools for each task.
See docs/mcp-client.md for complete documentation.
image
- Capture screenshots (with optional AI analysis via question parameter)list
- List applications, windows, or check server statusanalyze
- Analyze existing images with AI vision models (MCP-only tool, use peekaboo image --analyze
in CLI)see
- Capture screen and identify UI elementsclick
- Click on UI elements or coordinatestype
- Type text into UI elements (supports escape sequences)press
- Press individual keys (return, tab, escape, arrows, etc.)scroll
- Scroll content in any directionhotkey
- Press keyboard shortcutsswipe
- Perform swipe/drag gesturesmove
- Move mouse cursor to specific position or elementdrag
- Perform drag and drop operationsapp
- Launch, quit, focus, hide, and manage applicationswindow
- Manipulate windows (close, minimize, maximize, move, resize, focus)menu
- Interact with application menus and system menu extrasdock
- Launch apps from dock and manage dock itemsdialog
- Handle dialog windows (click buttons, input text)space
- Manage macOS Spaces (virtual desktops)run
- Execute automation scripts from .peekaboo.json filessleep
- Pause execution for specified durationclean
- Clean up session cache and temporary filespermissions
- Check system permissions (screen recording, accessibility)agent
- Execute complex automation tasks using AIPeekaboo v3 introduces powerful GUI automation capabilities, transforming it from a screenshot tool into a complete UI automation framework for macOS. This enables AI assistants to interact with any application through natural language commands.
The v3 automation system uses a see-then-interact workflow:
see
Tool - UI Element DiscoveryThe see
tool is the foundation of GUI automation. It captures a screenshot and identifies all interactive UI elements, assigning them unique Peekaboo IDs.
// Example: See what's on screen await see({ app_target: "Safari" }) // Multi-screen support - capture all screens await see({ app_target: "" }) // Empty string captures all screens // Capture specific screen by index await see({ app_target: "screen:0" }) // Primary screen await see({ app_target: "screen:1" }) // Second screen // Returns: { screenshot_path: "/tmp/peekaboo_123.png", session_id: "session_456", elements: { buttons: [ { id: "B1", label: "Submit", bounds: { x: 100, y: 200, width: 80, height: 30 } }, { id: "B2", label: "Cancel", bounds: { x: 200, y: 200, width: 80, height: 30 } } ], text_fields: [ { id: "T1", label: "Email", value: "", bounds: { x: 100, y: 100, width: 200, height: 30 } }, { id: "T2", label: "Password", value: "", bounds: { x: 100, y: 150, width: 200, height: 30 } } ], links: [ { id: "L1", label: "Forgot password?", bounds: { x: 100, y: 250, width: 120, height: 20 } } ], // ... other elements } }
Before capturing specific screens, you can list all connected displays:
# List all screens with details peekaboo list screens # Example output: # Screens (3 total): # # 0. Built-in Display (Primary) # Resolution: 3008×1692 # Position: 0,0 # Scale: 2.0x (Retina) # Visible Area: 3008×1612 # # 1. External Display # Resolution: 3840×2160 # Position: 3008,0 # Scale: 2.0x (Retina) # # 2. Studio Display # Resolution: 5120×2880 # Position: -5120,0 # Scale: 2.0x (Retina) # # 💡 Use 'peekaboo see --screen-index N' to capture a specific screen # Get JSON output for scripting peekaboo list screens --json-output
This command shows:
see --screen-index
or image --screen-index
When capturing multiple screens, Peekaboo automatically saves each screen as a separate file:
screenshot.png
screenshot_screen1.png
, screenshot_screen2.png
, etc.Display information (name, resolution) is shown for each captured screen:
📸 Captured 3 screens:
🖥️ Display 0: Built-in Retina Display (2880×1800) → screenshot.png
🖥️ Display 1: LG Ultra HD (3840×2160) → screenshot_screen1.png
🖥️ Display 2: Studio Display (5120×2880) → screenshot_screen2.png
Note: Annotation is automatically disabled for full screen captures due to performance constraints.
click
ToolClick on UI elements using various targeting methods:
// Click by element ID from see command await click({ on: "B1" }) // Click by query (searches button labels) await click({ query: "Submit" }) // Click by coordinates await click({ coords: "450,300" }) // Double-click await click({ on: "I1", double: true }) // Right-click await click({ query: "File", right: true }) // With custom wait timeout await click({ query: "Save", wait_for: 10000 })
type
ToolType text with support for escape sequences:
// Type into a specific field await type({ text: "[email protected]", on: "T1" }) // Type at current focus await type({ text: "Hello world" }) // Clear existing text first await type({ text: "New text", on: "T2", clear: true }) // Use escape sequences await type({ text: "Line 1\nLine 2\nLine 3" }) // Newlines await type({ text: "Name:\tJohn\tDoe" }) // Tabs await type({ text: "Path: C:\\data\\file.txt" }) // Literal backslash // Press return after typing await type({ text: "Submit", press_return: true }) // Adjust typing speed await type({ text: "Slow typing", delay: 100 })
\n
- Newline/return\t
- Tab\b
- Backspace/delete\e
- Escape\\
- Literal backslashpress
ToolPress individual keys or key sequences:
// Press single keys await press({ key: "return" }) // Press Enter await press({ key: "tab", count: 3 }) // Press Tab 3 times await press({ key: "escape" }) // Press Escape // Navigation keys await press({ key: "up" }) // Arrow up await press({ key: "down", count: 5 }) // Arrow down 5 times await press({ key: "home" }) // Home key await press({ key: "end" }) // End key // Function keys await press({ key: "f1" }) // F1 help key await press({ key: "f11" }) // F11 full screen // Special keys await press({ key: "forward_delete" }) // Forward delete (fn+delete) await press({ key: "caps_lock" }) // Caps Lock
scroll
ToolScroll content in any direction:
// Scroll down 3 ticks (default) await scroll({ direction: "down" }) // Scroll up 5 ticks await scroll({ direction: "up", amount: 5 }) // Scroll on a specific element await scroll({ direction: "down", on: "G1", amount: 10 }) // Smooth scrolling await scroll({ direction: "down", smooth: true }) // Horizontal scrolling await scroll({ direction: "right", amount: 3 })
hotkey
ToolPress keyboard shortcuts:
// Common shortcuts await hotkey({ keys: "cmd,c" }) // Copy await hotkey({ keys: "cmd,v" }) // Paste await hotkey({ keys: "cmd,tab" }) // Switch apps await hotkey({ keys: "cmd,shift,t" }) // Reopen closed tab // Function keys await hotkey({ keys: "f11" }) // Full screen // Custom hold duration await hotkey({ keys: "cmd,space", hold_duration: 100 })
swipe
ToolPerform swipe or drag gestures:
// Basic swipe await swipe({ from: "100,200", to: "300,200" }) // Slow drag await swipe({ from: "50,50", to: "200,200", duration: 2000 }) // Precise movement with more steps await swipe({ from: "0,0", to: "100,100", steps: 50 })
move
ToolMove the mouse cursor to specific positions or UI elements:
// Move to absolute coordinates await move({ coordinates: "500,300" }) // Move to center of screen await move({ center: true }) // Move to a specific UI element await move({ id: "B1" }) // Smooth movement with animation await move({ coordinates: "100,200", smooth: true, duration: 1000 })
drag
ToolPerform drag and drop operations between UI elements or coordinates:
// Drag from one element to another await drag({ from: "B1", to: "T1" }) // Drag using coordinates await drag({ from_coords: "100,100", to_coords: "500,500" }) // Drag with modifiers (e.g., holding shift) await drag({ from: "I1", to: "G2", modifiers: "shift" }) // Cross-application drag await drag({ from: "T1", to_app: "Finder", to_coords: "300,400" })
permissions
ToolCheck macOS system permissions required for automation:
// Check all permissions await permissions({}) // Returns permission status for: // - Screen Recording (required for screenshots) // - Accessibility (required for UI automation)
run
Tool - Automation ScriptsExecute complex automation workflows from JSON script files:
// Run a script await run({ script_path: "/path/to/login.peekaboo.json" }) // Continue on error await run({ script_path: "test.peekaboo.json", no_fail_fast: true })
{ "name": "Login to Website", "description": "Automated login workflow", "commands": [ { "command": "see", "args": { "app_target": "Safari" }, "comment": "Capture current state" }, { "command": "click", "args": { "query": "Email" }, "comment": "Click email field" }, { "command": "type", "args": { "text": "[email protected]" } }, { "command": "click", "args": { "query": "Password" } }, { "command": "type", "args": { "text": "secure_password" } }, { "command": "click", "args": { "query": "Sign In" } }, { "command": "sleep", "args": { "duration": 2000 }, "comment": "Wait for login" } ] }
Peekaboo v3 includes intelligent window focus management that ensures your automation commands target the correct window, even across different macOS Spaces (virtual desktops).
All interaction commands (click
, type
, scroll
, menu
, hotkey
, drag
) automatically:
All interaction commands support these focus-related flags:
# Disable automatic focus (not recommended) peekaboo click "Submit" --no-auto-focus # Set custom focus timeout (default: 5 seconds) peekaboo type "Hello" --focus-timeout 10 # Set retry count for focus operations (default: 3) peekaboo menu click --app Safari --item "New Tab" --focus-retry-count 5 # Control Space switching behavior peekaboo click "Login" --space-switch # Force Space switch peekaboo type "text" --bring-to-current-space # Move window to current Space
Peekaboo provides dedicated commands for managing macOS Spaces:
# List all Spaces peekaboo space list # Switch to a specific Space peekaboo space switch --to 2 # Move windows between Spaces peekaboo space move-window --app Safari --to 3 # Use list to see which Space contains windows peekaboo space list # Shows all Spaces and their windows
For explicit window focus control:
# Focus a window (switches Space if needed) peekaboo window focus --app Safari # Focus without switching Spaces (space-switch is a flag, not an option with value) peekaboo window focus --app Terminal # Default is to not switch spaces unless needed # Move window to current Space and focus peekaboo window focus --app "VS Code" --bring-to-current-space
By default, Peekaboo:
This ensures reliable automation across complex multi-window, multi-Space workflows without manual window management.
Peekaboo v3 introduces an AI-powered agent that can understand and execute complex automation tasks using natural language. The agent uses OpenAI's Chat Completions API with streaming support to break down your instructions into specific Peekaboo commands.
# Set your API key (OpenAI, Anthropic, or Grok) export OPENAI_API_KEY="your-openai-key-here" # OR export ANTHROPIC_API_KEY="your-anthropic-key-here" # OR export X_AI_API_KEY="your-grok-key-here" # Or save it securely in Peekaboo's config peekaboo config set-credential OPENAI_API_KEY your-api-key-here peekaboo config set-credential ANTHROPIC_API_KEY your-anthropic-key-here peekaboo config set-credential X_AI_API_KEY your-grok-key-here # Now you can use natural language automation! peekaboo "Open Safari and search for weather" peekaboo agent "Fill out the form" --model grok-4-0709 peekaboo agent "Create a document" --model claude-opus-4
When you provide a text argument without a subcommand, Peekaboo automatically uses the agent:
# These all invoke the agent directly peekaboo "Click the Submit button" peekaboo "Open TextEdit and write Hello" peekaboo "Take a screenshot of Safari"
Use the agent
subcommand for more control and options:
# With options and flags peekaboo agent "Fill out the contact form" --verbose peekaboo agent "Close all Finder windows" --dry-run peekaboo agent "Install this app" --max-steps 30 --json-output
# Web Automation peekaboo "Go to github.com and search for peekaboo" peekaboo "Click the first search result" peekaboo "Star this repository" # Document Creation peekaboo "Open Pages and create a new blank document" peekaboo "Type 'Meeting Agenda' as the title and make it bold" peekaboo "Add bullet points for Introduction, Main Topics, and Action Items" # File Management peekaboo "Open Finder and navigate to Downloads" peekaboo "Select all PDF files and move them to Documents" peekaboo "Create a new folder called 'Archived PDFs'" # Application Testing peekaboo "Launch Calculator and calculate 42 * 17" peekaboo "Take a screenshot of the result" peekaboo "Clear the calculator and close it" # System Tasks peekaboo "Open System Settings and go to Display settings" peekaboo "Change the display resolution to 1920x1080" peekaboo "Take a screenshot to confirm the change"
--verbose
- See the agent's reasoning and planning process--dry-run
- Preview what the agent would do without executing--max-steps <n>
- Limit the number of actions (default: 20)--model <model>
- Choose OpenAI model (default: gpt-4-turbo)--json-output
- Get structured JSON output--resume
- Resume the latest unfinished agent session--resume <session-id>
- Resume a specific session by IDThe agent has access to all Peekaboo commands:
When you run an agent command, here's what happens behind the scenes:
# Your command: peekaboo "Click the Submit button" # Agent breaks it down into: peekaboo see # Capture screen and identify elements peekaboo click "Submit" # Click the identified button
# Complex multi-step task peekaboo agent --verbose "Create a new document in Pages with the title 'Meeting Notes' and add today's date" # Agent will execute commands like: # 1. peekaboo see --app Pages # Check if Pages is open # 2. peekaboo app launch Pages # Launch if needed # 3. peekaboo sleep --duration 2000 # Wait for app to load # 4. peekaboo click "Create Document" # Click new document # 5. peekaboo type "Meeting Notes" # Enter title # 6. peekaboo hotkey cmd+b # Make text bold # 7. peekaboo hotkey return # New line # 8. peekaboo type "Date: $(date)" # Add current date # Relaunch an application (useful for applying settings or fixing issues) peekaboo app relaunch Safari # Quit and restart Safari peekaboo app relaunch "Visual Studio Code" --wait 3 --wait-until-ready
Use --verbose
to see exactly what the agent is doing:
peekaboo agent --verbose "Find and click the login button" # Output will show: # [Agent] Analyzing request... # [Agent] Planning steps: # 1. Capture current screen # 2. Identify login button # 3. Click on the button # [Agent] Executing: peekaboo see # [Agent] Found elements: button "Login" at (834, 423) # [Agent] Executing: peekaboo click "Login" # [Agent] Action completed successfully
--verbose
to understand what the agent is doing--max-steps
to prevent runaway automationThe agent supports resuming interrupted or incomplete sessions, maintaining full conversation context:
# Start a complex task peekaboo agent "Help me write a document about automation" # Agent creates document, starts writing... # <Interrupted by Ctrl+C or error> # Resume the latest session with context peekaboo agent --resume "Continue where we left off" # Or resume a specific session peekaboo agent --resume session_abc123 "Add a conclusion section" # List available sessions peekaboo agent --list-sessions # Note: There is no show-session command, use list-sessions to see all sessions
# Scenario 1: Continue an interrupted task peekaboo agent "Create a presentation about AI" # <Interrupted after creating first slide> peekaboo agent --resume "Add more slides about machine learning" # Scenario 2: Iterative refinement peekaboo agent "Fill out this form with test data" # <Agent completes task> peekaboo agent --resume "Actually, change the email to [email protected]" # Scenario 3: Debugging automation peekaboo agent --verbose "Login to the portal" # <Login fails> peekaboo agent --resume --verbose "Try clicking the other login button"
sleep
ToolPause execution between actions:
// Sleep for 1 second await sleep({ duration: 1000 }) // Sleep for 500ms await sleep({ duration: 500 })
window
ToolComprehensive window manipulation for any application:
// Close window await window({ action: "close", app: "Safari" }) await window({ action: "close", app: "Safari", title: "Downloads" }) // Minimize/Maximize await window({ action: "minimize", app: "Finder" }) await window({ action: "maximize", app: "Terminal" }) // Move window await window({ action: "move", app: "TextEdit", x: 100, y: 100 }) // Resize window await window({ action: "resize", app: "Notes", width: 800, height: 600 }) // Set exact bounds (move + resize) await window({ action: "set-bounds", app: "Safari", x: 50, y: 50, width: 1200, height: 800 }) // Focus window await window({ action: "focus", app: "Visual Studio Code" }) await window({ action: "focus", app: "Safari", index: 0 }) // Focus first window // List all windows (Note: window tool doesn't have a list action) // Use the list tool instead: await list({ item_type: "application_windows", app: "Finder" })
Peekaboo v3 includes comprehensive multi-screen support for window management across multiple displays. When listing windows, Peekaboo shows which screen each window is on, and provides powerful options for moving windows between screens.
When listing windows, each window shows its screen location:
# Windows now show their screen in the output peekaboo list windows --app Safari # Output includes: "Screen: Built-in Display" or "Screen: External Display"
Using Screen Index (0-based):
# Move window to specific screen by index peekaboo window resize --app Safari --target-screen 0 # Primary screen peekaboo window resize --app Terminal --target-screen 1 # Second screen peekaboo window resize --app Notes --target-screen 2 # Third screen
Using Screen Presets:
# Move to next/previous screen peekaboo window resize --app Safari --screen-preset next peekaboo window resize --app Terminal --screen-preset previous # Move to primary screen (with menu bar) peekaboo window resize --app Notes --screen-preset primary # Keep on same screen (useful with other resize options) peekaboo window resize --app TextEdit --screen-preset same --preset left_half
You can combine screen movement with window positioning:
# Move to screen 1 and maximize peekaboo window resize --app Safari --target-screen 1 --preset maximize # Move to next screen and position on left half peekaboo window resize --app Terminal --screen-preset next --preset left_half # Move to screen 0 at specific coordinates peekaboo window resize --app Notes --target-screen 0 --x 100 --y 100 # Move to primary screen with custom size peekaboo window resize --app TextEdit --screen-preset primary --width 1200 --height 800
The AI agent understands multi-screen commands:
peekaboo agent "Move all Safari windows to the external display" peekaboo agent "Put Terminal on my second screen" peekaboo agent "Arrange windows with Safari on the left screen and Notes on the right"
menu
ToolInteract with application menu bars and system menu extras:
// List all menus and items for an app await menu({ action: "list", app: "Calculator" }) // Click a simple menu item await menu({ action: "click", app: "Safari", item: "New Window" }) // Navigate nested menus with path await menu({ action: "click", app: "TextEdit", path: "Format > Font > Bold" }) // Click system menu extras (WiFi, Bluetooth, etc.) await menu({ action: "click-extra", title: "WiFi" })
app
ToolControl applications - launch, quit, focus, hide, and switch between apps:
// Launch an application await app({ action: "launch", name: "Safari" }) // Quit an application await app({ action: "quit", name: "TextEdit" }) // Force quit await app({ action: "quit", name: "Notes", force: true }) // Focus/switch to app await app({ action: "focus", name: "Google Chrome" }) // Hide/unhide apps await app({ action: "hide", name: "Finder" }) await app({ action: "unhide", name: "Finder" })
dock
ToolInteract with the macOS Dock:
// List all dock items await dock({ action: "list" }) // Launch app from dock await dock({ action: "launch", app: "Safari" }) // Right-click on dock item await dock({ action: "right-click", app: "Finder" }) // Show/hide dock await dock({ action: "hide" }) await dock({ action: "show" })
dialog
ToolHandle system dialogs and alerts:
// List open dialogs await dialog({ action: "list" }) // Click dialog button await dialog({ action: "click", button: "OK" }) // Input text in dialog field await dialog({ action: "input", text: "filename.txt" }) // Select file in open/save dialog await dialog({ action: "file", path: "/Users/me/Documents/file.pdf" }) // Dismiss dialog await dialog({ action: "dismiss" })
clean
ToolClean up session cache and temporary files:
// Clean all sessions await clean({}) // Clean sessions older than 7 hours await clean({ older_than: 7 }) // Clean specific session await clean({ session: "session_123" }) // Dry run to see what would be cleaned await clean({ dry_run: true })
Peekaboo v3 uses sessions to maintain UI state across commands:
see
tool~/.peekaboo/session/<PID>/
see
- Capture the current UI state before interactingsleep
after actions that trigger animationssee
again to confirm actions succeededclean
tool periodically// 1. See the login form const { elements } = await see({ app_target: "MyApp" }) // 2. Fill in credentials await click({ on: "T1" }) // Click email field await type({ text: "[email protected]" }) await click({ on: "T2" }) // Click password field await type({ text: "password123" }) // 3. Submit await click({ query: "Sign In" }) // 4. Wait and verify await sleep({ duration: 2000 }) await see({ app_target: "MyApp" }) // Verify logged in
// 1. Focus browser await see({ app_target: "Safari" }) // 2. Open new tab await hotkey({ keys: "cmd,t" }) // 3. Type search await type({ text: "Peekaboo MCP automation" }) await type({ text: "{return}" }) // 4. Wait for results await sleep({ duration: 3000 }) // 5. Click first result await see({ app_target: "Safari" }) await click({ on: "L1" })
// 1. Capture form const { elements } = await see({ app_target: "Forms" }) // 2. Fill each field for (const field of elements.text_fields) { await click({ on: field.id }) await type({ text: "Test data", clear: true }) } // 3. Check all checkboxes for (const checkbox of elements.checkboxes) { if (!checkbox.checked) { await click({ on: checkbox.id }) } } // 4. Submit await click({ query: "Submit" })
wait_for
timeoutclean
tool to clear corrupted sessionsPeekaboo uses macOS's unified logging system. Use pblog
to monitor logs:
# View recent logs ./scripts/pblog.sh # Stream logs continuously ./scripts/pblog.sh -f # Debug specific issues ./scripts/pblog.sh -c ClickService -d
Note: macOS redacts log values by default, showing <private>
.
See docs/pblog-guide.md and docs/logging-profiles/README.md for solutions.
Settings follow this precedence (highest to lowest):
~/.peekaboo/credentials
)~/.peekaboo/config.json
)Setting | Config File | Environment Variable | Description |
---|---|---|---|
AI Providers | aiProviders.providers | PEEKABOO_AI_PROVIDERS | Comma-separated list (e.g., "openai/gpt-4.1,anthropic/claude,grok/grok-4,ollama/llava:latest") |
OpenAI API Key | Use credentials file | OPENAI_API_KEY | Required for OpenAI provider |
Anthropic API Key | Use credentials file | ANTHROPIC_API_KEY | Required for Claude models |
Grok API Key | Use credentials file | X_AI_API_KEY or XAI_API_KEY | Required for Grok (xAI) models |
Ollama URL | aiProviders.ollamaBaseUrl | PEEKABOO_OLLAMA_BASE_URL | Default: http://localhost:11434 |
Default Save Path | defaults.savePath | PEEKABOO_DEFAULT_SAVE_PATH | Where screenshots are saved (default: current directory) |
Log Level | logging.level | PEEKABOO_LOG_LEVEL | trace, debug, info, warn, error, fatal |
Log Path | logging.path | PEEKABOO_LOG_FILE | Log file location |
CLI Binary Path | - | PEEKABOO_CLI_PATH | Override bundled Swift CLI path (advanced usage) |
For security, Peekaboo supports three methods for API key storage (in order of recommendation):
Environment Variables (Most secure for automation)
export OPENAI_API_KEY="sk-..."
Credentials File (Best for interactive use)
peekaboo config set-credential OPENAI_API_KEY sk-... # Stored in ~/.peekaboo/credentials with chmod 600
Config File (Not recommended - use credentials file instead)
PEEKABOO_AI_PROVIDERS
: Comma-separated list of AI providers to use for image analysis
provider/model,provider/model
"openai/gpt-4.1,anthropic/claude-opus-4,grok/grok-4,ollama/llava:latest"
"openai/gpt-4.1,ollama/llava:latest"
openai
, anthropic
, grok
, ollama
OPENAI_API_KEY
: Your OpenAI API key for GPT-4.1 Vision
openai
providerANTHROPIC_API_KEY
: Your Anthropic API key for Claude models
anthropic
providerX_AI_API_KEY
or XAI_API_KEY
: Your xAI API key for Grok models
grok
providerPEEKABOO_OLLAMA_BASE_URL
: Base URL for your Ollama server
http://localhost:11434
PEEKABOO_DEFAULT_SAVE_PATH
: Default directory for saving screenshots
~/Desktop/Screenshots
)PEEKABOO_LOG_LEVEL
: Control logging verbosity
trace
, debug
, info
, warn
, error
, fatal
info
debug
or trace
for troubleshootingPEEKABOO_LOG_FILE
: Custom log file location
/tmp/peekaboo-mcp.log
(MCP server)PEEKABOO_CLI_PATH
: Override the bundled Swift CLI binary path
Environment variables can be set in multiple ways:
# For a single command PEEKABOO_AI_PROVIDERS="ollama/llava:latest" peekaboo image --analyze "What is this?" --path image.png # Export for the current session export OPENAI_API_KEY="sk-..." export ANTHROPIC_API_KEY="sk-ant-..." export X_AI_API_KEY="xai-..." export PEEKABOO_DEFAULT_SAVE_PATH="~/Desktop/Screenshots" # Add to your shell profile (~/.zshrc or ~/.bash_profile) echo 'export OPENAI_API_KEY="sk-..."' >> ~/.zshrc echo 'export ANTHROPIC_API_KEY="sk-ant-..."' >> ~/.zshrc echo 'export X_AI_API_KEY="xai-..."' >> ~/.zshrc
For privacy-focused local AI analysis:
# Install Ollama brew install ollama ollama serve # Download recommended models ollama pull llama3.3 # RECOMMENDED for agent tasks (supports tool calling) ollama pull llava:latest # Vision model (no tool support) ollama pull qwen2-vl:7b # Lighter vision alternative # Use with Peekaboo PEEKABOO_AI_PROVIDERS="ollama/llama3.3" peekaboo agent "Click the Submit button" PEEKABOO_AI_PROVIDERS="ollama/llama" peekaboo agent "Take a screenshot" # Defaults to llama3.3 # Configure Peekaboo (optional) peekaboo config edit # Set providers to: "ollama/llama3.3" for agent tasks # Or: "ollama/llava:latest" for image analysis only
Models with Tool Calling (✅ Recommended for automation):
llama3.3
- Best overall for agent tasksllama3.2
- Good alternativeVision Models (❌ No tool calling):
llava
- Image analysis onlybakllava
- Alternative vision modelNote: For agent automation tasks, use llama3.3
. Vision models like llava
can analyze images but cannot perform GUI automation.
Screen Recording (Required):
Accessibility (Optional):
Check permissions status:
peekaboo permissions check peekaboo permissions request screen-recording peekaboo permissions request accessibility
Peekaboo v3 includes significant performance improvements:
These optimizations ensure that operations that previously could hang for 2+ minutes now complete in seconds.
xcode-select --install
)# Clone the repository git clone https://github.com/steipete/peekaboo.git cd peekaboo # Install dependencies npm install # Build everything (CLI + MCP server) npm run build:all # Build options: npm run build # TypeScript only npm run build:swift # Swift CLI only (universal binary) ./scripts/build-cli-standalone.sh # Quick CLI build ./scripts/build-cli-standalone.sh --install # Build and install to /usr/local/bin
# Run all pre-release checks and create release artifacts ./scripts/release-binaries.sh # Skip checks (if you've already run them) ./scripts/release-binaries.sh --skip-checks # Create GitHub release draft ./scripts/release-binaries.sh --create-github-release # Full release with npm publish ./scripts/release-binaries.sh --create-github-release --publish-npm
The release script creates:
peekaboo-macos-universal.tar.gz
- Standalone CLI binary (universal)@steipete-peekaboo-mcp-{version}.tgz
- npm packagechecksums.txt
- SHA256 checksums for verificationFor development, enable automatic staleness detection to ensure you're always using the latest built CLI version: git config peekaboo.check-build-staleness true
. This is recommended when working with AI assistants that frequently modify source code, as it prevents using outdated binaries.
Poltergeist is a helpful ghost that watches your Swift files and automatically rebuilds the CLI when they change. Perfect for development workflows!
First, install Watchman (required):
brew install watchman
Run these commands from the project root:
# Start the watcher npm run poltergeist:start # or the more thematic: npm run poltergeist:haunt # Check status npm run poltergeist:status # View activity logs npm run poltergeist:logs # Stop watching npm run poltergeist:stop # or the more thematic: npm run poltergeist:rest
Poltergeist monitors:
Core/PeekabooCore/**/*.swift
Core/AXorcist/**/*.swift
Apps/CLI/**/*.swift
Package.swift
and Package.resolved
filesWhen changes are detected, it automatically:
npm run build:swift
.poltergeist.log
Peekaboo uses Swift Testing framework (Swift 6.0+) for all test suites:
# Run all tests swift test # Run specific test target swift test --filter PeekabooTests # Run tests with verbose output swift test --verbose
# Test CLI directly peekaboo list server_status peekaboo image --mode screen --path test.png peekaboo image --analyze "What is shown?" --path test.png # Test MCP server npx @modelcontextprotocol/inspector npx -y @steipete/peekaboo-mcp
Issue | Solution |
---|---|
Permission denied | Grant Screen Recording permission in System Settings |
Window not found | Try using fuzzy matching or list windows first |
AI analysis failed | Check API keys and provider configuration |
Command not found | Ensure Peekaboo is in your PATH or use full path |
Enable debug logging for more details:
export PEEKABOO_LOG_LEVEL=debug peekaboo list server_status
For step-by-step debugging, use the verbose flag:
peekaboo image --app Safari --verbose 2>&1 | less
Peekaboo includes Poltergeist, an automatic build system that watches Swift source files and rebuilds the CLI in the background. This ensures your CLI binary is always up-to-date during development.
# Start Poltergeist (runs in background) npm run poltergeist:haunt # Check status npm run poltergeist:status # Stop Poltergeist npm run poltergeist:rest
Key features:
./scripts/peekaboo-wait.sh
) handles build coordination# Build everything npm run build:all # Build CLI only npm run build:swift # Build TypeScript server npm run build
Contributions are welcome! Please:
MIT License - see LICENSE file for details.
Created by Peter Steinberger - @steipete
---# CI Test