VGGT MPS
STDIO3D Vision Agent for Apple Silicon with MPS acceleration for multi-view reconstruction
3D Vision Agent for Apple Silicon with MPS acceleration for multi-view reconstruction
🍎 VGGT (Visual Geometry Grounded Transformer) optimized for Apple Silicon with Metal Performance Shaders (MPS)
Transform single or multi-view images into rich 3D reconstructions using Facebook Research's VGGT model, now accelerated on M1/M2/M3 Macs.
Major Update: Complete packaging overhaul with unified CLI, PyPI-ready distribution, and production-grade tooling!
vggt command with subcommands for all operationspyproject.toml, proper src layoutvggt web)VGGT reconstructs 3D scenes from images by predicting:
# Install from PyPI (when published) pip install vggt-mps # Download model weights (5GB) vggt download
git clone https://github.com/jmanhype/vggt-mps.git cd vggt-mps # Install with uv (10-100x faster than pip!) make install # Or manually with uv uv pip install -e .
git clone https://github.com/jmanhype/vggt-mps.git cd vggt-mps # Create virtual environment python -m venv vggt-env source vggt-env/bin/activate # Install dependencies pip install -r requirements.txt
# Download the 5GB VGGT model vggt download # Or if running from source: python main.py download
Or manually download from Hugging Face
# Test MPS acceleration vggt test --suite mps # Or from source: python main.py test --suite mps
Expected output:
✅ MPS (Metal Performance Shaders) available!
   Running on Apple Silicon GPU
✅ Model weights loaded to mps
✅ MPS operations working correctly!
# Copy environment configuration cp .env.example .env # Edit .env with your settings nano .env
All functionality is accessible through the unified vggt command:
# Quick demo with sample images vggt demo # Demo with kitchen dataset (4 images) vggt demo --kitchen --images 4 # Process your own images vggt reconstruct data/*.jpg # Use sparse attention for large scenes vggt reconstruct --sparse data/*.jpg # Export to specific format vggt reconstruct --export ply data/*.jpg # Launch interactive web interface vggt web # Open on specific port with public link vggt web --port 8080 --share # Run comprehensive tests vggt test --suite all # Test sparse attention specifically vggt test --suite sparse # Benchmark performance vggt benchmark --compare # Download model weights vggt download
If running from source without installation:
python main.py demo python main.py reconstruct data/*.jpg python main.py web python main.py test --suite mps python main.py benchmark --compare
~/Library/Application Support/Claude/claude_desktop_config.json:{ "mcpServers": { "vggt-agent": { "command": "uv", "args": [ "run", "--python", "/path/to/vggt-mps/vggt-env/bin/python", "--with", "fastmcp", "fastmcp", "run", "/path/to/vggt-mps/src/vggt_mps_mcp.py" ] } } }
vggt_quick_start_inference - Quick 3D reconstruction from imagesvggt_extract_video_frames - Extract frames from videovggt_process_images - Full VGGT pipelinevggt_create_3d_scene - Generate GLB 3D filesvggt_reconstruct_3d_scene - Multi-view reconstructionvggt_visualize_reconstruction - Create visualizationsvggt-mps/
├── main.py                      # Single entry point
├── setup.py                     # Package installation
├── requirements.txt             # Dependencies
├── .env.example                 # Environment configuration
│
├── src/                         # Source code
│   ├── config.py               # Centralized configuration
│   ├── vggt_core.py            # Core VGGT processing
│   ├── vggt_sparse_attention.py # Sparse attention (O(n) scaling)
│   ├── visualization.py        # 3D visualization utilities
│   │
│   ├── commands/               # CLI commands
│   │   ├── demo.py            # Demo command
│   │   ├── reconstruct.py     # Reconstruction command
│   │   ├── test_runner.py     # Test runner
│   │   ├── benchmark.py       # Performance benchmarking
│   │   └── web_interface.py   # Gradio web app
│   │
│   └── utils/                  # Utilities
│       ├── model_loader.py    # Model management
│       ├── image_utils.py     # Image processing
│       └── export.py          # Export to PLY/OBJ/GLB
│
├── tests/                       # Organized test suite
│   ├── test_mps.py            # MPS functionality tests
│   ├── test_sparse.py         # Sparse attention tests
│   └── test_integration.py    # End-to-end tests
│
├── data/                        # Input data directory
├── outputs/                     # Output directory
├── models/                      # Model storage
│
├── docs/                        # Documentation
│   ├── API.md                  # API documentation
│   ├── SPARSE_ATTENTION.md    # Technical details
│   └── BENCHMARKS.md          # Performance results
│
└── LICENSE                      # MIT License
from src.tools.readme import vggt_quick_start_inference result = vggt_quick_start_inference( image_directory="./tmp/inputs", device="mps", # Use Apple Silicon GPU max_images=4, save_outputs=True )
from src.tools.demo_gradio import vggt_extract_video_frames result = vggt_extract_video_frames( video_path="input_video.mp4", frame_interval_seconds=1.0 )
from src.tools.demo_viser import vggt_reconstruct_3d_scene result = vggt_reconstruct_3d_scene( images_dir="./tmp/inputs", device_type="mps", confidence_threshold=0.5 )
City-scale 3D reconstruction is now possible! We've implemented Gabriele Berton's research idea for O(n) memory scaling.
from src.vggt_sparse_attention import make_vggt_sparse # Convert any VGGT to sparse in 1 line sparse_vggt = make_vggt_sparse(regular_vggt, device="mps") # Same usage, O(n) memory instead of O(n²) output = sparse_vggt(images) # Handles 1000+ images!
| Images | Regular | Sparse | Savings | 
|---|---|---|---|
| 100 | O(10K) | O(1K) | 10x | 
| 500 | O(250K) | O(5K) | 50x | 
| 1000 | O(1M) | O(10K) | 100x | 
See full results: docs/SPARSE_ATTENTION_RESULTS.md
# Check PyTorch MPS support python -c "import torch; print(torch.backends.mps.is_available())"
# Verify model file ls -lh repo/vggt/vggt_model.pt # Should show ~5GB file
vggt commandSee full changelog
We follow a lightweight Git Flow:
main holds the latest stable release and is protected.develop is the default integration branch for day-to-day work.When contributing:
develop (git switch develop && git switch -c feature/my-change).develop; maintainers will promote changes to main during releases.Please open issues for bugs or feature requests before starting large efforts. Full details, testing expectations, and the release process live in CONTRIBUTING.md.
MIT License - See LICENSE file for details
Made with 🍎 for Apple Silicon by the AI community