Sui FAISS RAG
HTTP-SSEMCP server for querying vector database and retrieving documents for Retrieval-Augmented Generation.
MCP server for querying vector database and retrieving documents for Retrieval-Augmented Generation.
This project provides a proof-of-concept implementation of a Machine Conversation Protocol (MCP) server that allows an AI agent to query a vector database and retrieve relevant documents for Retrieval-Augmented Generation (RAG).
pipx is a tool to help you install and run Python applications in isolated environments.
# On macOS brew install pipx pipx ensurepath # On Ubuntu/Debian sudo apt update sudo apt install python3-pip python3-venv python3 -m pip install --user pipx python3 -m pipx ensurepath # On Windows with pip pip install pipx pipx ensurepath
# Navigate to the directory containing the mcp_server folder cd /path/to/mcp-server-project # Install in editable mode pipx install -e .
.env.example
to .env
GITHUB_TOKEN=your_token_here
OPENAI_API_KEY=your_key_here
If you prefer not to use pipx:
cd mcp_server pip install -r requirements.txt
After installing with pipx, you'll have access to the following commands:
# Download Move files with default settings mcp-download --query "use sui" --output-dir docs/move_files # Download with more options mcp-download --query "module sui::coin" --max-results 50 --new-index --verbose
# Search GitHub and index files with default settings mcp-search-index --keywords "sui move" # Search multiple keywords and customize options mcp-search-index --keywords "sui move,move framework" --max-repos 30 --output-results --verbose # Save search results and use a custom index location mcp-search-index --keywords "sui coin,sui::transfer" --index-file custom/path/index.bin --output-results
The mcp-search-index
command provides enhanced GitHub repository search capabilities:
# Index files in the default location mcp-index # Index with custom options mcp-index --docs-dir path/to/files --index-file path/to/index.bin --verbose
# Basic query mcp-query "What is a module in Sui Move?" # Advanced query with options mcp-query "How do I define a struct in Sui Move?" -k 3 -f
# Basic RAG query (will use simulated LLM if no API key is provided) mcp-rag "What is a module in Sui Move?" # Using with a specific LLM API mcp-rag "How do I define a struct in Sui Move?" --api-key your_api_key --top-k 3 # Output as JSON for further processing mcp-rag "What are the benefits of sui::coin?" --output-json > rag_response.json
# Start the server with default settings mcp-server # Start with custom settings mcp-server --host 127.0.0.1 --port 8080 --index-file custom/path/index.bin
cd mcp_server python main.py
The server will start on http://localhost:8000
To download Move files from GitHub and populate your vector database:
# Download Move files with default query "use sui" ./run.sh --download-move # Customize the search query ./run.sh --download-move --github-query "module sui::coin" --max-results 50 # Download, index, and start the server ./run.sh --download-move --index
You can also use the Python script directly:
python download_move_files.py --query "use sui" --output-dir docs/move_files
Before querying, you need to index your documents. You can place your text files (.txt), Markdown files (.md), or Move files (.move) in the docs
directory.
To index the documents, you can either:
--index
flag:./run.sh --index
python index_move_files.py --docs-dir docs/move_files --index-file data/faiss_index.bin
You can use the local query script:
python local_query.py "What is RAG?" # With more options python local_query.py -k 3 -f "How to define a struct in Sui Move?"
# Direct RAG query with an LLM python rag_integration.py "What is a module in Sui Move?" --index-file data/faiss_index.bin # With API key (if you have one) OPENAI_API_KEY=your_key_here python rag_integration.py "How do coins work in Sui?"
The MCP API endpoint is available at /mcp/action
. You can use it to perform different actions:
retrieve_documents
: Retrieve relevant documents for a queryindex_documents
: Index documents from a directoryExample:
curl -X POST "http://localhost:8000/mcp/action" -H "Content-Type: application/json" -d '{"action_type": "retrieve_documents", "payload": {"query": "What is RAG?", "top_k": 3}}'
The full RAG (Retrieval-Augmented Generation) pipeline works as follows:
This workflow is fully implemented in the rag_integration.py
module, which can be used either through the command line or as a library in your own applications.
The system can extract Move files from GitHub based on search queries. It implements two methods:
To configure your GitHub token, set it in the .env
file or as an environment variable:
GITHUB_TOKEN=your_github_token_here
mcp_server/
├── __init__.py # Package initialization
├── main.py # Main server file
├── mcp_api.py # MCP API implementation
├── index_move_files.py # File indexing utility
├── local_query.py # Local query utility
├── download_move_files.py # GitHub Move file extractor
├── rag_integration.py # LLM integration for RAG
├── pyproject.toml # Package configuration
├── requirements.txt # Dependencies
├── .env.example # Example environment variables
├── README.md # This file
├── data/ # Storage for the FAISS index
├── docs/ # Sample documents
│ └── move_files/ # Downloaded Move files
├── models/ # Model implementations
│ └── vector_store.py # FAISS vector store implementation
└── utils/
├── document_processor.py # Document processing utilities
└── github_extractor.py # GitHub file extraction utilities
To extend this proof-of-concept:
MIT