Models
Available 1-bit quantized models for BitNet
Available Models
BitNet supports a variety of 1-bit quantized models. These models are pre-quantized and ready for inference. For usage instructions, see our Usage Guide. For installation instructions, check our Installation Guide.
BitNet-b1.58 Models
BitNet-b1.58 is a native BitNet architecture using 1.58-bit quantization (ternary-like with -1, 0, +1).
BitNet-b1.58-2B-4T
Parameters: 2B
Training: 4 trillion tokens
Model Type: Base
HuggingFace: GGUF Format
BF16: BF16 Format
Falcon3 Models
Falcon3 models quantized to 1-bit using BitNet quantization techniques. These are instruction-tuned models suitable for conversational AI applications.
Falcon3-1B-Instruct-1.58bit
Parameters: 1B
Model Type: Instruction-tuned
Use Case: Chat, Q&A, general assistant
HuggingFace: View on HuggingFace
Falcon3-3B-Instruct-1.58bit
Parameters: 3B
Model Type: Instruction-tuned
Use Case: Chat, Q&A, general assistant
HuggingFace: View on HuggingFace
Falcon3-7B-Instruct-1.58bit
Parameters: 7B
Model Type: Instruction-tuned
Use Case: Chat, Q&A, general assistant
HuggingFace: View on HuggingFace
Falcon3-10B-Instruct-1.58bit
Parameters: 10B
Model Type: Instruction-tuned
Use Case: Chat, Q&A, general assistant
HuggingFace: View on HuggingFace
Llama3 Models
Llama3-based models with BitNet quantization, trained on large token counts.
Llama3-8B-1.58-100B-tokens
Parameters: 8B
Training: 100 billion tokens
Model Type: Base
HuggingFace: View on HuggingFace
Downloading Models
Models can be downloaded directly from HuggingFace using the HuggingFace CLI:
huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
--local-dir models/BitNet-b1.58-2B-4T
For .safetensors models, you'll need to convert them to GGUF format. See our Usage Guide for conversion instructions.
Model Selection Guide
Choose by Use Case
- General Purpose Text Generation: BitNet-b1.58-2B-4T or BitNet-b1.58-3B
- Conversational AI / Chat: Falcon3-Instruct models (1B, 3B, 7B, or 10B depending on requirements)
- Research & Experimentation: Any model depending on research focus
- Resource-Constrained Environments: Smaller models like Falcon3-1B-Instruct or BitNet-b1.58-2B-4T
- Higher Quality Output: Larger models like Falcon3-7B-Instruct or Falcon3-10B-Instruct
Choose by Resource Constraints
- Low Memory (4-8GB): Falcon3-1B-Instruct or BitNet-b1.58-2B-4T
- Medium Memory (8-16GB): Falcon3-3B-Instruct or BitNet-b1.58-3B
- High Memory (16GB+): Falcon3-7B-Instruct, Falcon3-10B-Instruct, or Llama3-8B-1.58
Model Format
BitNet models are distributed in GGUF format, which is optimized for efficient loading and inference. Some models are also available in .safetensors format, which can be converted to GGUF using the provided conversion utilities.
Quantization types available:
- i2_s: 1-bit signed quantization (recommended)
- tl1: Ternary-like quantization variant
Using Models
After downloading a model, you need to set it up before use:
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
Then run inference as shown in our Usage Guide.
Related Resources
- Usage Guide - How to use models
- Installation Guide - Setup instructions
- Benchmark Guide - Compare model performance
- Documentation - Complete API reference
- Microsoft on HuggingFace - All BitNet models