BitNet

Official Inference Framework for 1-bit LLMs

BitNet is a revolutionary framework for running 1-bit Large Language Models, providing efficient inference with reduced memory footprint and improved performance. Developed by Microsoft, BitNet enables state-of-the-art language model capabilities with unprecedented efficiency.

25.8k GitHub Stars
2.1k Forks
MIT License

Key Features

โšก

High Performance

Optimized inference with custom CUDA kernels for maximum efficiency and speed.

Learn more โ†’
๐Ÿ’พ

Memory Efficient

1-bit quantization dramatically reduces memory requirements while maintaining model quality.

Learn more โ†’
๐Ÿ”ง

Easy Integration

Simple installation and straightforward API for seamless integration into your projects.

Learn more โ†’
๐Ÿค–

Multiple Models

Support for various BitNet model architectures including BitNet-b1.58 and Falcon3 variants.

View models โ†’
๐Ÿ“Š

Benchmark Tools

Comprehensive benchmarking utilities to evaluate model performance and throughput.

Run benchmarks โ†’
๐Ÿ“š

Extensive Documentation

Complete documentation, tutorials, and examples to help you get started quickly.

Read docs โ†’

Quick Start

Installation
# Create conda environment
conda create -n bitnet-cpp python=3.9
conda activate bitnet-cpp

# Install dependencies
pip install -r requirements.txt

# Download model
huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
  --local-dir models/BitNet-b1.58-2B-4T

# Setup environment
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
Run Inference
python run_inference.py \
  -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
  -p "You are a helpful assistant" \
  -cnv

Learn & Explore

New to 1-bit LLMs? Dive into concepts, use cases, and how BitNet compares to other approaches.

๐Ÿ“–

What is a 1-bit LLM?

Understand 1-bit quantization and why it enables 16x memory savings and faster inference.

Read more โ†’
๐ŸŽฏ

Use Cases

Edge deployment, cost-effective APIs, and on-device AIโ€”see where 1-bit models shine.

Explore use cases โ†’
โš–๏ธ

Comparison

BitNet vs FP16, INT8, and other quantization methods. Choose the right approach.

Compare โ†’
๐Ÿ› ๏ธ

Tutorials & Best Practices

Step-by-step tutorials and production tips for running BitNet models.

Tutorials ยท Best practices โ†’
๐Ÿ“š

Glossary & Troubleshooting

Definitions of terms and solutions to common issues.

Glossary ยท Troubleshooting โ†’
๐Ÿ‘ฅ

Community & Changelog

Join the community and stay updated with the latest releases.

Community ยท Changelog โ†’

Supported Models

BitNet supports a variety of 1-bit model architectures. View all supported models for detailed information.

BitNet-b1.58-2B-4T

2B parameter model trained on 4 trillion tokens

View on HuggingFace โ†’

BitNet-b1.58-3B

3B parameter variant with enhanced capabilities

View on HuggingFace โ†’

Falcon3-1B-Instruct

1B parameter instruction-tuned model

View on HuggingFace โ†’

Llama3-8B-1.58

8B parameter model with 100B tokens training

View on HuggingFace โ†’

Join the Community

BitNet is open source and actively developed. Join thousands of developers using BitNet for efficient LLM inference.