
Visual Reasoning
STDIOMCP server enabling visual reasoning through diagram creation and manipulation for spatial thinking
MCP server enabling visual reasoning through diagram creation and manipulation for spatial thinking
Language models fundamentally operate on text, which limits their ability to reason through problems that humans typically solve using spatial, diagrammatic, or visual thinking. Current models struggle with:
The Visual Reasoning Server provides models with the ability to create, manipulate, and reason with explicit visual representations. By externalizing visual thinking, models can solve complex problems that benefit from diagrammatic reasoning, much like how mathematical notation extends human calculation abilities beyond plain text.
interface VisualElement { id: string; type: "node" | "edge" | "container" | "annotation"; label?: string; properties: { [key: string]: any; // Position, size, color, etc. }; // For edges source?: string; // ID of source element target?: string; // ID of target element // For containers contains?: string[]; // IDs of contained elements } interface VisualOperationData { // Operation details operation: "create" | "update" | "delete" | "transform" | "observe"; elements?: VisualElement[]; transformationType?: "rotate" | "move" | "resize" | "recolor" | "regroup"; // Visual diagram metadata diagramId: string; diagramType: "graph" | "flowchart" | "stateDiagram" | "conceptMap" | "treeDiagram" | "custom"; iteration: number; // Reasoning about the diagram observation?: string; insight?: string; hypothesis?: string; // Next steps nextOperationNeeded: boolean; }
sequenceDiagram participant Model participant VisServer as Visual Reasoning Server participant State as Visual State Model->>VisServer: Create initial nodes (operation=create) VisServer->>State: Initialize visual representation VisServer-->>Model: Return visual rendering + state Model->>VisServer: Add connections (operation=create, type=edge) VisServer->>State: Update with new edges VisServer-->>Model: Return updated visual + state Model->>VisServer: Group related elements (operation=transform, type=regroup) VisServer->>State: Update with new grouping VisServer-->>Model: Return updated visual + state Model->>VisServer: Make observation about pattern (operation=observe) VisServer->>State: Record observation with current state VisServer-->>Model: Return visual with observation Model->>VisServer: Update based on insight (operation=update) VisServer->>State: Modify visual elements VisServer-->>Model: Return final visual + state
The server supports different visual representation types:
Models can manipulate visual elements through operations:
The server tracks iteration history, allowing models to:
The server enables bidirectional translation between:
The server provides multiple representations:
Models can create and manipulate component diagrams showing data flow, dependencies, and interactions between system components.
When designing or explaining algorithms, models can create flowcharts, state diagrams, or visual traces of execution.
For organizing complex domains of knowledge, models can create and refine concept maps showing relationships between ideas.
When analyzing data, models can create visual representations to identify patterns that might be difficult to detect in text.
The server is implemented using TypeScript with:
The implementation leverages existing graph visualization libraries (like Graphviz for DOT output or custom ASCII art generation) to provide rich visual feedback within the constraints of text-based interfaces.
This server significantly enhances model capabilities for domains where visual or spatial thinking provides a natural advantage over purely textual reasoning.
Facilitates visual thinking through creating and manipulating diagram elements.
Add this to your claude_desktop_config.json
:
{ "mcpServers": { "visual-reasoning": { "command": "npx", "args": [ "-y", "@waldzellai/visual-reasoning" ] } } }
{ "mcpServers": { "visual-reasoning": { "command": "docker", "args": [ "run", "--rm", "-i", "cognitive-enhancement-mcp/visual-reasoning" ] } } }
Docker:
docker build -t cognitive-enhancement-mcp/visual-reasoning -f packages/visual-reasoning/Dockerfile .
This MCP server is licensed under the MIT License.