
全能MCP
STDIOAI界面自动化服务器
AI界面自动化服务器
OmniMCP provides rich UI context and interaction capabilities to AI models through Model Context Protocol (MCP) and microsoft/OmniParser. It focuses on enabling deep understanding of user interfaces through visual analysis, structured planning, and precise interaction execution.
omnimcp/agent_executor.py
).pynput
(omnimcp/input.py
).cli.py
) for running tasks.cli.py
uses AgentExecutor
to run a perceive-plan-act loop. It captures the screen (VisualState
), plans using an LLM (core.plan_action_for_ui
), and executes actions (InputController
).
python cli.py
opens Calculator and computes 5*9.
python demo_synthetic.py
uses generated images (no real I/O). (Note: Pending refactor to use AgentExecutor).
uv
installed (pip install uv
)pynput
. May need system libraries (libx11-dev
, etc.) - see pynput
docs.(macOS display scaling dependencies are handled automatically during installation).
Requires AWS credentials in .env
(see .env.example
). Warning: Creates AWS resources (EC2, Lambda, etc.) incurring costs. Use python -m omnimcp.omniparser.server stop
to clean up.
AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_KEY
# OMNIPARSER_URL=http://... # Optional: Skip auto-deploy
git clone [https://github.com/OpenAdaptAI/OmniMCP.git](https://github.com/OpenAdaptAI/OmniMCP.git) cd OmniMCP ./install.sh # Creates .venv, installs deps incl. test extras cp .env.example .env # Edit .env with your keys # Activate: source .venv/bin/activate (Linux/macOS) or relevant Windows command
Ensure environment is activated and .env
is configured.
# Run default goal (Calculator task) python cli.py # Run custom goal python cli.py --goal "Your goal here" # See options python cli.py --help
Debug outputs are saved in runs/<timestamp>/
.
Note on MCP Server: An experimental MCP server (OmniMCP
class in omnimcp/mcp_server.py
) exists but is separate from the primary cli.py
/AgentExecutor
workflow.
cli.py
) - Entry point, setup, starts Executor.omnimcp/agent_executor.py
) - Orchestrates loop, manages state/artifacts.omnimcp/visual_state.py
) - Perception (screenshot, calls parser).omnimcp/omniparser/
) - Manages OmniParser server communication/deployment.omnimcp/core.py
) - Generates action plan.omnimcp/input.py
) - Executes actions (mouse/keyboard).omnimcp/mcp_server.py
) - Experimental MCP interface.# Setup (if not done): ./install.sh # Activate env: source .venv/bin/activate (or similar) # Format/Lint: uv run ruff format . && uv run ruff check . --fix # Run tests: uv run pytest tests/
Running python cli.py
saves timestamped runs in runs/
, including:
step_N_state_raw.png
step_N_state_parsed.png
(with element boxes)step_N_action_highlight.png
(with action highlight)final_state.png
Detailed logs are in logs/run_YYYY-MM-DD_HH-mm-ss.log
(LOG_LEVEL=DEBUG
in .env
recommended).
# --- Initialization & Auto-Deploy --- 2025-MM-DD HH:MM:SS | INFO | omnimcp.omniparser.client:... - No server_url provided, attempting discovery/deployment... 2025-MM-DD HH:MM:SS | INFO | omnimcp.omniparser.server:... - Creating new EC2 instance... 2025-MM-DD HH:MM:SS | SUCCESS | omnimcp.omniparser.server:... - Instance i-... is running. Public IP: ... 2025-MM-DD HH:MM:SS | INFO | omnimcp.omniparser.server:... - Setting up auto-shutdown infrastructure... 2025-MM-DD HH:MM:SS | SUCCESS | omnimcp.omniparser.server:... - Auto-shutdown infrastructure setup completed... ... (SSH connection, Docker setup) ... 2025-MM-DD HH:MM:SS | SUCCESS | omnimcp.omniparser.client:... - Auto-deployment successful. Server URL: http://... ... (Agent Executor Init) ... # --- Agent Execution Loop Example Step --- 2025-MM-DD HH:MM:SS | INFO | omnimcp.agent_executor:run:... - --- Step N/10 --- 2025-MM-DD HH:MM:SS | DEBUG | omnimcp.agent_executor:run:... - Perceiving current screen state... 2025-MM-DD HH:MM:SS | INFO | omnimcp.visual_state:update:... - VisualState update complete. Found X elements. Took Y.YYs. 2025-MM-DD HH:MM:SS | INFO | omnimcp.agent_executor:run:... - Perceived state with X elements. ... (Save artifacts) ... 2025-MM-DD HH:MM:SS | DEBUG | omnimcp.agent_executor:run:... - Planning next action... ... (LLM Call) ... 2025-MM-DD HH:MM:SS | INFO | omnimcp.agent_executor:run:... - LLM Plan: Action=..., TargetID=..., GoalComplete=False 2025-MM-DD HH:MM:SS | DEBUG | omnimcp.agent_executor:run:... - Added to history: Step N: Planned action ... 2025-MM-DD HH:MM:SS | INFO | omnimcp.agent_executor:run:... - Executing action: ... 2025-MM-DD HH:MM:SS | SUCCESS | omnimcp.agent_executor:run:... - Action executed successfully. 2025-MM-DD HH:MM:SS | DEBUG | omnimcp.agent_executor:run:... - Step N duration: Z.ZZs ... (Loop continues or finishes) ...
(Note: Details like timings, counts, IPs, instance IDs, and specific plans will vary)
Key limitations & future work areas:
@omni.publish
style) and potentially integrate loop logic with the experimental MCP Server (OmniMCP
class).demo_synthetic.py
to use AgentExecutor
.Core loop via cli.py
/AgentExecutor
is functional for basic tasks. Performance and robustness need significant improvement. MCP integration is experimental.
uv run ruff format .
, uv run ruff check . --fix
, uv run pytest tests/
)MIT License