BitNet

Official Inference Framework for 1-bit LLMs

BitNet is a revolutionary framework for running 1-bit Large Language Models, providing efficient inference with reduced memory footprint and improved performance. Developed by Microsoft, BitNet enables state-of-the-art language model capabilities with unprecedented efficiency.

Get Started View on GitHub Installation Guide

25.8k GitHub Stars

2.1k Forks

MIT License

Key Features

⚡

High Performance

Optimized inference with custom CUDA kernels for maximum efficiency and speed.

Learn more →

💾

Memory Efficient

1-bit quantization dramatically reduces memory requirements while maintaining model quality.

Learn more →

🔧

Easy Integration

Simple installation and straightforward API for seamless integration into your projects.

Learn more →

🤖

Multiple Models

Support for various BitNet model architectures including BitNet-b1.58 and Falcon3 variants.

View models →

📊

Benchmark Tools

Comprehensive benchmarking utilities to evaluate model performance and throughput.

Run benchmarks →

📚

Extensive Documentation

Complete documentation, tutorials, and examples to help you get started quickly.

Read docs →

Explore All Features

Quick Start

Installation

# Create conda environment
conda create -n bitnet-cpp python=3.9
conda activate bitnet-cpp

# Install dependencies
pip install -r requirements.txt

# Download model
huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
  --local-dir models/BitNet-b1.58-2B-4T

# Setup environment
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s

Run Inference

python run_inference.py \
  -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
  -p "You are a helpful assistant" \
  -cnv

Full Installation Guide Usage Examples Documentation

Learn & Explore

New to 1-bit LLMs? Dive into concepts, use cases, and how BitNet compares to other approaches.

📖

What is a 1-bit LLM?

Understand 1-bit quantization and why it enables 16x memory savings and faster inference.

🎯

Use Cases

Edge deployment, cost-effective APIs, and on-device AI—see where 1-bit models shine.

Explore use cases →

⚖️

Comparison

BitNet vs FP16, INT8, and other quantization methods. Choose the right approach.

Compare →

🛠️

Tutorials & Best Practices

Step-by-step tutorials and production tips for running BitNet models.

Tutorials · Best practices →

📚

Glossary & Troubleshooting

Definitions of terms and solutions to common issues.

Glossary · Troubleshooting →

👥

Community & Changelog

Join the community and stay updated with the latest releases.

Community · Changelog →

Supported Models

BitNet supports a variety of 1-bit model architectures. View all supported models for detailed information.

BitNet-b1.58-2B-4T

2B parameter model trained on 4 trillion tokens

View on HuggingFace →

BitNet-b1.58-3B

3B parameter variant with enhanced capabilities

View on HuggingFace →

Falcon3-1B-Instruct

1B parameter instruction-tuned model

View on HuggingFace →

Llama3-8B-1.58

8B parameter model with 100B tokens training

View on HuggingFace →

Browse All Models

Join the Community

BitNet is open source and actively developed. Join thousands of developers using BitNet for efficient LLM inference.

📦 GitHub Repository 💬 Report Issues 🔀 Pull Requests 🤝 Contribute