txtai嵌入式知识库
STDIO基于txtai的语义搜索和知识图谱服务器
基于txtai的语义搜索和知识图谱服务器
A Model Context Protocol (MCP) server implementation powered by txtai, providing semantic search, knowledge graph capabilities, and AI-driven text processing through a standardized interface.
This project leverages txtai, an all-in-one embeddings database for RAG leveraging semantic search, knowledge graph construction, and language model workflows. txtai offers several key advantages:
The project contains a knowledge base builder tool and a MCP server. The knowledge base builder tool is a command-line interface for creating and managing knowledge bases. The MCP server provides a standardized interface to access the knowledge base.
It is not required to use the knowledge base builder tool to build a knowledge base. You can always build a knowledge base using txtai's programming interface by writing a Python script or even using a jupyter notebook. As long as the knowledge base is built using txtai, it can be loaded by the MCP server. Better yet, the knowledge base can be a folder on the file system or an exported .tar.gz file. Just give it to the MCP server and it will load it.
The kb_builder
module provides a command-line interface for creating and managing knowledge bases:
Note it is possibly limited in functionality and currently only provided for convenience.
The MCP server provides a standardized interface to access the knowledge base:
We recommend using uv with Python 3.10 or newer for the best experience. This provides better dependency management and ensures consistent behavior.
# Install uv if you don't have it already pip install -U uv # Create a virtual environment with Python 3.10 or newer uv venv --python=3.10 # or 3.11, 3.12, etc. # Activate the virtual environment (bash/zsh) source .venv/bin/activate # For fish shell # source .venv/bin/activate.fish # Install from PyPI uv pip install kb-mcp-server
Note: We pin transformers to version 4.49.0 to avoid deprecation warnings about
transformers.agents.tools
that appear in version 4.50.0 and newer. If you use a newer version of transformers, you may see these warnings, but they don't affect functionality.
# Create a new conda environment (optional) conda create -n embedding-mcp python=3.10 conda activate embedding-mcp # Install from PyPI pip install kb-mcp-server
# Create a new conda environment conda create -n embedding-mcp python=3.10 conda activate embedding-mcp # Clone the repository git clone https://github.com/Geeksfino/kb-mcp-server.git.git cd kb-mcp-server # Install dependencies pip install -e .
# Install uv if not already installed pip install uv # Create a new virtual environment uv venv source .venv/bin/activate # Option 1: Install from PyPI uv pip install kb-mcp-server # Option 2: Install from source (for development) uv pip install -e .
uvx allows you to run packages directly from PyPI without installing them:
# Run the MCP server uvx --from [email protected] kb-mcp-server --embeddings /path/to/knowledge_base # Build a knowledge base uvx --from [email protected] kb-build --input /path/to/documents --config config.yml # Search a knowledge base uvx --from [email protected] kb-search /path/to/knowledge_base "Your search query"
You can use the command-line tools installed from PyPI, the Python module directly, or the convenient shell scripts:
# Build a knowledge base from documents kb-build --input /path/to/documents --config config.yml # Update an existing knowledge base with new documents kb-build --input /path/to/new_documents --update # Export a knowledge base for portability kb-build --input /path/to/documents --export my_knowledge_base.tar.gz # Search a knowledge base kb-search /path/to/knowledge_base "What is machine learning?" # Search with graph enhancement kb-search /path/to/knowledge_base "What is machine learning?" --graph --limit 10
# Build a knowledge base from documents uvx --from [email protected] kb-build --input /path/to/documents --config config.yml # Update an existing knowledge base with new documents uvx --from [email protected] kb-build --input /path/to/new_documents --update # Export a knowledge base for portability uvx --from [email protected] kb-build --input /path/to/documents --export my_knowledge_base.tar.gz # Search a knowledge base uvx --from [email protected] kb-search /path/to/knowledge_base "What is machine learning?" # Search with graph enhancement uvx --from [email protected] kb-search /path/to/knowledge_base "What is machine learning?" --graph --limit 10
# Build a knowledge base from documents python -m kb_builder build --input /path/to/documents --config config.yml # Update an existing knowledge base with new documents python -m kb_builder build --input /path/to/new_documents --update # Export a knowledge base for portability python -m kb_builder build --input /path/to/documents --export my_knowledge_base.tar.gz
The repository includes convenient wrapper scripts that make it easier to build and search knowledge bases:
# Build a knowledge base using a template configuration ./scripts/kb_build.sh /path/to/documents technical_docs # Build using a custom configuration file ./scripts/kb_build.sh /path/to/documents /path/to/my_config.yml # Update an existing knowledge base ./scripts/kb_build.sh /path/to/documents technical_docs --update # Search a knowledge base ./scripts/kb_search.sh /path/to/knowledge_base "What is machine learning?" # Search with graph enhancement ./scripts/kb_search.sh /path/to/knowledge_base "What is machine learning?" --graph
Run ./scripts/kb_build.sh --help
or ./scripts/kb_search.sh --help
for more options.
# Start with a specific knowledge base folder kb-mcp-server --embeddings /path/to/knowledge_base_folder # Start with a given knowledge base archive kb-mcp-server --embeddings /path/to/knowledge_base.tar.gz
# Start with a specific knowledge base folder uvx [email protected] --embeddings /path/to/knowledge_base_folder # Start with a given knowledge base archive uvx [email protected] --embeddings /path/to/knowledge_base.tar.gz
# Start with a specific knowledge base folder python -m txtai_mcp_server --embeddings /path/to/knowledge_base_folder # Start with a given knowledge base archive python -m txtai_mcp_server --embeddings /path/to/knowledge_base.tar.gz
The MCP server is configured using environment variables or command-line arguments, not YAML files. YAML files are only used for configuring txtai components during knowledge base building.
Here's how to configure the MCP server:
# Start the server with command-line arguments kb-mcp-server --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000 # Or using uvx (no installation required) uvx [email protected] --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000 # Or using the Python module python -m txtai_mcp_server --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000 # Or use environment variables export TXTAI_EMBEDDINGS=/path/to/knowledge_base export MCP_SSE_HOST=0.0.0.0 export MCP_SSE_PORT=8000 python -m txtai_mcp_server
Common configuration options:
--embeddings
: Path to the knowledge base (required)--host
: Host address to bind to (default: localhost)--port
: Port to listen on (default: 8000)--transport
: Transport to use, either 'sse' or 'stdio' (default: stdio)--enable-causal-boost
: Enable causal boost feature for enhanced relevance scoring--causal-config
: Path to custom causal boost configuration YAML fileTo configure an LLM client to use the MCP server, you need to create an MCP configuration file. Here's an example mcp_config.json
:
If you use a virtual Python environment to install the server, you can use the following configuration - note that MCP host like Claude will not be able to connect to the server if you use a virtual environment, you need to use the absolute path to the Python executable of the virtual environment where you did "pip install" or "uv pip install", for example
{ "mcpServers": { "kb-server": { "command": "/your/home/project/.venv/bin/kb-mcp-server", "args": [ "--embeddings", "/path/to/knowledge_base.tar.gz" ], "cwd": "/path/to/working/directory" } } }
If you use your system default Python, you can use the following configuration:
{ "rag-server": { "command": "python3", "args": [ "-m", "txtai_mcp_server", "--embeddings", "/path/to/knowledge_base.tar.gz", "--enable-causal-boost" ], "cwd": "/path/to/working/directory" } }
Alternatively, if you're using uvx, assuming you have uvx installed in your system via "brew install uvx" etc, or you 've installed uvx and made it globally accessible via:
# Create a symlink to /usr/local/bin (which is typically in the system PATH)
sudo ln -s /Users/cliang/.local/bin/uvx /usr/local/bin/uvx
This creates a symbolic link from your user-specific installation to a system-wide location. For macOS applications like Claude Desktop, you can modify the system-wide PATH by creating or editing a launchd configuration file:
# Create a plist file to set environment variables for all GUI applications
sudo nano /Library/LaunchAgents/environment.plist
Add this content:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>my.startup</string> <key>ProgramArguments</key> <array> <string>sh</string> <string>-c</string> <string>launchctl setenv PATH $PATH:/Users/cliang/.local/bin</string> </array> <key>RunAtLoad</key> <true/> </dict> </plist>
Then load it:
sudo launchctl load -w /Library/LaunchAgents/environment.plist
You'll need to restart your computer for this to take effect, though.
{ "mcpServers": { "kb-server": { "command": "uvx", "args": [ "[email protected]", "--embeddings", "/path/to/knowledge_base", "--host", "localhost", "--port", "8000" ], "cwd": "/path/to/working/directory" } } }
Place this configuration file in a location accessible to your LLM client and configure the client to use it. The exact configuration steps will depend on your specific LLM client.
Building a knowledge base with txtai requires a YAML configuration file that controls various aspects of the embedding process. This configuration is used by the kb_builder
tool, not the MCP server itself.
One may need to tune segmentation/chunking strategies, embedding models, and scoring methods, as well as configure graph construction, causal boosting, weights of hybrid search, and more.
Fortunately, txtai provides a powerful YAML configuration system that requires no coding. Here's an example of a comprehensive configuration for knowledge base building:
# Path to save/load embeddings index path: ~/.txtai/embeddings writable: true # Content storage in SQLite content: path: sqlite:///~/.txtai/content.db # Embeddings configuration embeddings: # Model settings path: sentence-transformers/nli-mpnet-base-v2 backend: faiss gpu: true batch: 32 normalize: true # Scoring settings scoring: hybrid hybridalpha: 0.75 # Pipeline configuration pipeline: workers: 2 queue: 100 timeout: 300 # Question-answering pipeline extractor: path: distilbert-base-cased-distilled-squad maxlength: 512 minscore: 0.3 # Graph configuration graph: backend: sqlite path: ~/.txtai/graph.db similarity: 0.75 # Threshold for creating graph connections limit: 10 # Maximum connections per node
The src/kb_builder/configs
directory contains configuration templates for different use cases and storage backends:
memory.yml
: In-memory vectors (fastest for development, no persistence)sqlite-faiss.yml
: SQLite for content + FAISS for vectors (local file-based persistence)postgres-pgvector.yml
: PostgreSQL + pgvector (production-ready with full persistence)base.yml
: Base configuration templatecode_repositories.yml
: Optimized for code repositoriesdata_science.yml
: Configured for data science documentsgeneral_knowledge.yml
: General purpose knowledge baseresearch_papers.yml
: Optimized for academic paperstechnical_docs.yml
: Configured for technical documentationYou can use these as starting points for your own configurations:
python -m kb_builder build --input /path/to/documents --config src/kb_builder/configs/technical_docs.yml # Or use a storage-specific configuration python -m kb_builder build --input /path/to/documents --config src/kb_builder/configs/postgres-pgvector.yml
The MCP server leverages txtai's built-in graph functionality to provide powerful knowledge graph capabilities:
The MCP server includes a sophisticated causal boosting mechanism that enhances search relevance by identifying and prioritizing causal relationships:
This mechanism significantly improves responses to "why" and "how" questions by surfacing content that explains relationships between concepts. The causal boosting configuration is highly customizable through YAML files, allowing adaptation to different domains and languages.
MIT License - see LICENSE file for details