
OpenAlex Author Disambiguation
STDIOStreamlined MCP server for author disambiguation and academic research using OpenAlex API
Streamlined MCP server for author disambiguation and academic research using OpenAlex API
A streamlined Model Context Protocol (MCP) server for author disambiguation and academic research using the OpenAlex.org API. Specifically designed for AI agents with optimized data structures and enhanced functionality.
For detailed installation instructions, see INSTALL.md.
Clone the repository:
git clone https://github.com/drAbreu/alex-mcp.git cd alex-mcp
Create a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
Install the package:
pip install -e .
Configure environment:
export OPENALEX_MAILTO=[email protected]
Run the server:
./run_alex_mcp.sh # Or, if installed as a CLI tool: alex-mcp
Add to your Claude Desktop configuration file:
{ "mcpServers": { "alex-mcp": { "command": "/path/to/alex-mcp/run_alex_mcp.sh", "env": { "OPENALEX_MAILTO": "[email protected]" } } } }
Replace /path/to/alex-mcp
with the actual path to the repository on your system.
You can load this MCP server in your OpenAI agent workflow using the agents.mcp.MCPServerStdio
interface:
from agents.mcp import MCPServerStdio async with MCPServerStdio( name="OpenAlex MCP For Author disambiguation and works", cache_tools_list=True, params={ "command": "uvx", "args": [ "--from", "git+https://github.com/drAbreu/[email protected]", "alex-mcp" ], "env": { "OPENALEX_MAILTO": "[email protected]" } }, client_session_timeout_seconds=10 ) as alex_mcp: await alex_mcp.connect() tools = await alex_mcp.list_tools() print(f"Available tools: {[tool.name for tool in tools]}")
This MCP server is specifically optimized for academic research workflows:
# Optimized for academic research workflows from alex_agent import run_author_research # Enhanced functionality with streamlined data result = await run_author_research( "Find J. Abreu at EMBO with recent publications" ) # Clean, structured output for AI processing print(f"Success: {result['workflow_metadata']['success']}") print(f"Quality: {result['research_result']['metadata']['result_analysis']['quality_score']}/100")
# Standard launch uvx --from git+https://github.com/drAbreu/[email protected] alex-mcp # With environment variables OPENALEX_MAILTO=[email protected] uvx --from git+https://github.com/drAbreu/[email protected] alex-mcp
Get multiple author candidates using OpenAlex autocomplete API for intelligent disambiguation.
Parameters:
name
(required): Author name to search (e.g., "James Briscoe", "M. Ralser")context
(optional): Context for disambiguation (e.g., "Francis Crick Institute developmental biology")limit
(optional): Maximum candidates (1-10, default: 5)Key Features:
Streamlined Output:
{ "query": "James Briscoe", "context": "Francis Crick Institute", "total_candidates": 3, "candidates": [ { "openalex_id": "https://openalex.org/A5019391436", "display_name": "James Briscoe", "institution_hint": "The Francis Crick Institute, UK", "works_count": 415, "cited_by_count": 24623, "external_id": "https://orcid.org/0000-0002-1020-5240" } ] }
Usage Pattern:
# Get multiple candidates for disambiguation candidates = await autocomplete_authors( "James Briscoe", context="Francis Crick Institute developmental biology" ) # AI selects best match based on institutional context # Much more accurate than single search result!
Search for authors with streamlined output for AI agents.
Parameters:
name
(required): Author name to searchinstitution
(optional): Institution name filtertopic
(optional): Research topic filtercountry_code
(optional): Country code filter (e.g., "US", "DE")limit
(optional): Maximum results (1-25, default: 20)Streamlined Output:
{ "query": "J. Abreu", "total_count": 3, "results": [ { "id": "https://openalex.org/A123456789", "display_name": "Jorge Abreu-Vicente", "orcid": "https://orcid.org/0000-0000-0000-0000", "display_name_alternatives": ["J. Abreu-Vicente", "Jorge Abreu Vicente"], "affiliations": [ { "institution": { "display_name": "European Molecular Biology Organization", "country_code": "DE" }, "years": [2023, 2024, 2025] } ], "cited_by_count": 316, "works_count": 25, "summary_stats": { "h_index": 9, "i10_index": 5 }, "x_concepts": [ { "display_name": "Astrophysics", "score": 0.8 }, { "display_name": "Machine Learning", "score": 0.6 } ] } ] }
Features: Clean structure optimized for AI reasoning and disambiguation
Retrieve works for a given author with enhanced filtering capabilities.
Parameters:
author_id
(required): OpenAlex author IDlimit
(optional): Maximum results (1-50, default: 20)order_by
(optional): "date" or "citations" (default: "date")publication_year
(optional): Filter by specific yeartype
(optional): Work type filter (e.g., "journal-article")authorships_institutions_id
(optional): Filter by institutionis_retracted
(optional): Filter retracted worksopen_access_is_oa
(optional): Filter by open access statusEnhanced Output:
{ "author_id": "https://openalex.org/A123456789", "total_count": 25, "results": [ { "id": "https://openalex.org/W123456789", "title": "A platform for the biomedical application of large language models", "doi": "10.1038/s41587-024-02534-3", "publication_year": 2025, "type": "journal-article", "cited_by_count": 42, "authorships": [ { "author": { "display_name": "Jorge Abreu-Vicente" }, "institutions": [ { "display_name": "European Molecular Biology Organization" } ] } ], "locations": [ { "source": { "display_name": "Nature Biotechnology", "type": "journal" } } ], "open_access": { "is_oa": true }, "primary_topic": { "display_name": "Biomedical Engineering" } } ] }
Features: Comprehensive work data with flexible filtering for targeted queries
This MCP server provides focused, structured data specifically designed for AI agent consumption:
# Target high-impact journal articles works = await retrieve_author_works( author_id="https://openalex.org/A123456789", type="journal-article", # Focus on journal publications open_access_is_oa=True, # Open access only order_by="citations", # Most cited first limit=15 ) # Career transition analysis authors = await search_authors( name="J. Abreu", institution="EMBO", # Current institution topic="Machine Learning", # Research focus limit=10 )
from alex_mcp.server import search_authors_core # Comprehensive author search results = search_authors_core( name="J Abreu Vicente", institution="EMBO", topic="Machine Learning", limit=20 ) print(f"Found {results.total_count} candidates") for author in results.results: print(f"- {author.display_name}") if author.affiliations: current_inst = author.affiliations[0].institution.display_name print(f" Institution: {current_inst}") print(f" Metrics: {author.cited_by_count} citations, h-index {author.summary_stats.h_index}") if author.x_concepts: fields = [c.display_name for c in author.x_concepts[:3]] print(f" Research: {', '.join(fields)}")
from alex_mcp.server import retrieve_author_works_core # Comprehensive work retrieval works = retrieve_author_works_core( author_id="https://openalex.org/A5058921480", type="journal-article", # Academic focus order_by="citations", # Impact-based ordering limit=20 ) print(f"Found {works.total_count} publications") for work in works.results: print(f"- {work.title}") if work.locations: journal = work.locations[0].source.display_name print(f" Published in: {journal} ({work.publication_year})") print(f" Impact: {work.cited_by_count} citations") if work.open_access and work.open_access.is_oa: print(" ✓ Open Access")
# Analyze career transitions def analyze_career_path(author_result): affiliations = author_result.affiliations if len(affiliations) > 1: print("Career path:") for aff in sorted(affiliations, key=lambda x: min(x.years)): years = f"{min(aff.years)}-{max(aff.years)}" print(f" {years}: {aff.institution.display_name}") # Research evolution if author_result.x_concepts: print("Research areas:") for concept in author_result.x_concepts[:5]: print(f" {concept.display_name} (score: {concept.score:.2f})") # Usage results = search_authors_core("Jorge Abreu Vicente") if results.results: analyze_career_path(results.results[0])
# Required export OPENALEX_MAILTO=[email protected] # Optional settings export OPENALEX_MAX_AUTHORS=100 # Maximum authors per query export OPENALEX_USER_AGENT=research-agent-v1.0 export ALEX_MCP_VERSION=4.1.0 # Rate limiting (respectful usage) export OPENALEX_RATE_PER_SEC=10 export OPENALEX_RATE_PER_DAY=100000
# For comprehensive research applications config = { "max_authors_per_query": 25, # Detailed author analysis "max_works_per_author": 50, # Complete publication history "enable_all_filters": True, # Full filtering capabilities "detailed_affiliations": True, # Complete institutional data "research_concepts": True # Detailed concept analysis }
alex-mcp/
├── src/alex_mcp/
│ ├── server.py # Main MCP server
│ ├── data_objects.py # Data models and structures
│ └── utils.py # Utility functions
├── examples/
│ ├── basic_usage.py # Simple examples
│ ├── advanced_queries.py # Complex query examples
│ └── integration_demo.py # AI agent integration
├── tests/
│ ├── test_server.py # Server functionality tests
│ └── test_integration.py # Integration tests
└── docs/
└── api_reference.md # Detailed API documentation
# Install test dependencies pip install -e ".[test]" # Run functionality tests pytest tests/test_server.py -v # Test with real queries python examples/basic_usage.py # Test AI agent integration python examples/integration_demo.py
# Test author disambiguation python examples/basic_usage.py --query "J. Abreu" --institution "EMBO" # Test work retrieval python examples/advanced_queries.py --author-id "A123456789" --type "journal-article" # Test integration patterns python examples/integration_demo.py --workflow "career-analysis"
Perfect integration with AI-powered research analysis:
# Enhanced academic research agent from alex_agent import AcademicResearchAgent agent = AcademicResearchAgent( mcp_servers=[alex_mcp], # Streamlined data processing model="gpt-4.1-2025-04-14" ) # Complex research queries with structured data result = await agent.research_author( "Find J. Abreu at EMBO with machine learning publications" ) # Rich, structured output for AI reasoning print(f"Quality Score: {result.quality_score}/100") print(f"Author disambiguation: {result.confidence}") print(f"Research fields: {result.research_domains}")
# Collaborative research analysis async def research_collaboration_network(seed_author): # Find primary author authors = await alex_mcp.search_authors(seed_author) primary = authors['results'][0] # Get their works works = await alex_mcp.retrieve_author_works( primary['id'], type="journal-article" ) # Analyze co-authors and build network collaborators = set() for work in works['results']: for authorship in work.get('authorships', []): collaborators.add(authorship['author']['display_name']) return { 'primary_author': primary, 'publication_count': len(works['results']), 'collaborator_network': list(collaborators), 'research_impact': sum(w['cited_by_count'] for w in works['results']) }
We welcome contributions to improve functionality and add new features:
git checkout -b feature/enhanced-filtering
This project is licensed under the MIT License. See LICENSE for details.