Unsloth LLM Optimization
STDIOLibrary makes LLM fine-tuning 2x faster with 80% less memory.
Library makes LLM fine-tuning 2x faster with 80% less memory.
An MCP server for Unsloth - a library that makes LLM fine-tuning 2x faster with 80% less memory.
Unsloth is a library that dramatically improves the efficiency of fine-tuning large language models:
Unsloth achieves these improvements through custom CUDA kernels written in OpenAI's Triton language, optimized backpropagation, and dynamic 4-bit quantization.
pip install unsloth
cd unsloth-server npm install npm run build
{ "mcpServers": { "unsloth-server": { "command": "node", "args": ["/path/to/unsloth-server/build/index.js"], "env": { "HUGGINGFACE_TOKEN": "your_token_here" // Optional }, "disabled": false, "autoApprove": [] } } }
Verify if Unsloth is properly installed on your system.
Parameters: None
Example:
const result = await use_mcp_tool({ server_name: "unsloth-server", tool_name: "check_installation", arguments: {} });
Get a list of all models supported by Unsloth, including Llama, Mistral, Phi, and Gemma variants.
Parameters: None
Example:
const result = await use_mcp_tool({ server_name: "unsloth-server", tool_name: "list_supported_models", arguments: {} });
Load a pretrained model with Unsloth optimizations for faster inference and fine-tuning.
Parameters:
model_name
(required): Name of the model to load (e.g., "unsloth/Llama-3.2-1B")max_seq_length
(optional): Maximum sequence length for the model (default: 2048)load_in_4bit
(optional): Whether to load the model in 4-bit quantization (default: true)use_gradient_checkpointing
(optional): Whether to use gradient checkpointing to save memory (default: true)Example:
const result = await use_mcp_tool({ server_name: "unsloth-server", tool_name: "load_model", arguments: { model_name: "unsloth/Llama-3.2-1B", max_seq_length: 4096, load_in_4bit: true } });
Fine-tune a model with Unsloth optimizations using LoRA/QLoRA techniques.
Parameters:
model_name
(required): Name of the model to fine-tunedataset_name
(required): Name of the dataset to use for fine-tuningoutput_dir
(required): Directory to save the fine-tuned modelmax_seq_length
(optional): Maximum sequence length for training (default: 2048)lora_rank
(optional): Rank for LoRA fine-tuning (default: 16)lora_alpha
(optional): Alpha for LoRA fine-tuning (default: 16)batch_size
(optional): Batch size for training (default: 2)gradient_accumulation_steps
(optional): Number of gradient accumulation steps (default: 4)learning_rate
(optional): Learning rate for training (default: 2e-4)max_steps
(optional): Maximum number of training steps (default: 100)dataset_text_field
(optional): Field in the dataset containing the text (default: 'text')load_in_4bit
(optional): Whether to use 4-bit quantization (default: true)Example:
const result = await use_mcp_tool({ server_name: "unsloth-server", tool_name: "finetune_model", arguments: { model_name: "unsloth/Llama-3.2-1B", dataset_name: "tatsu-lab/alpaca", output_dir: "./fine-tuned-model", max_steps: 100, batch_size: 2, learning_rate: 2e-4 } });
Generate text using a fine-tuned Unsloth model.
Parameters:
model_path
(required): Path to the fine-tuned modelprompt
(required): Prompt for text generationmax_new_tokens
(optional): Maximum number of tokens to generate (default: 256)temperature
(optional): Temperature for text generation (default: 0.7)top_p
(optional): Top-p for text generation (default: 0.9)Example:
const result = await use_mcp_tool({ server_name: "unsloth-server", tool_name: "generate_text", arguments: { model_path: "./fine-tuned-model", prompt: "Write a short story about a robot learning to paint:", max_new_tokens: 512, temperature: 0.8 } });
Export a fine-tuned Unsloth model to various formats for deployment.
Parameters:
model_path
(required): Path to the fine-tuned modelexport_format
(required): Format to export to (gguf, ollama, vllm, huggingface)output_path
(required): Path to save the exported modelquantization_bits
(optional): Bits for quantization (for GGUF export) (default: 4)Example:
const result = await use_mcp_tool({ server_name: "unsloth-server", tool_name: "export_model", arguments: { model_path: "./fine-tuned-model", export_format: "gguf", output_path: "./exported-model.gguf", quantization_bits: 4 } });
You can use custom datasets by formatting them properly and hosting them on Hugging Face or providing a local path:
const result = await use_mcp_tool({ server_name: "unsloth-server", tool_name: "finetune_model", arguments: { model_name: "unsloth/Llama-3.2-1B", dataset_name: "json", data_files: {"train": "path/to/your/data.json"}, output_dir: "./fine-tuned-model" } });
For large models on limited hardware:
Model | VRAM | Unsloth Speed | VRAM Reduction | Context Length |
---|---|---|---|---|
Llama 3.3 (70B) | 80GB | 2x faster | >75% | 13x longer |
Llama 3.1 (8B) | 80GB | 2x faster | >70% | 12x longer |
Mistral v0.3 (7B) | 80GB | 2.2x faster | 75% less | - |
Apache-2.0