Models - BitNet

Available Models

BitNet supports a variety of 1-bit quantized models. These models are pre-quantized and ready for inference. For usage instructions, see our Usage Guide. For installation instructions, check our Installation Guide.

BitNet-b1.58 Models

BitNet-b1.58 is a native BitNet architecture using 1.58-bit quantization (ternary-like with -1, 0, +1).

BitNet-b1.58-2B-4T

Parameters: 2B

Training: 4 trillion tokens

Model Type: Base

HuggingFace: GGUF Format

BF16: BF16 Format

BitNet-b1.58-3B

Parameters: 3B

Model Type: Base

HuggingFace: View on HuggingFace

BitNet-b1.58-Large

Parameters: Large variant

Model Type: Base

HuggingFace: View on HuggingFace

Falcon3 Models

Falcon3 models quantized to 1-bit using BitNet quantization techniques. These are instruction-tuned models suitable for conversational AI applications.

Falcon3-1B-Instruct-1.58bit

Parameters: 1B

Model Type: Instruction-tuned

Use Case: Chat, Q&A, general assistant

HuggingFace: View on HuggingFace

Falcon3-3B-Instruct-1.58bit

Parameters: 3B

Model Type: Instruction-tuned

Use Case: Chat, Q&A, general assistant

HuggingFace: View on HuggingFace

Falcon3-7B-Instruct-1.58bit

Parameters: 7B

Model Type: Instruction-tuned

Use Case: Chat, Q&A, general assistant

HuggingFace: View on HuggingFace

Falcon3-10B-Instruct-1.58bit

Parameters: 10B

Model Type: Instruction-tuned

Use Case: Chat, Q&A, general assistant

HuggingFace: View on HuggingFace

Llama3 Models

Llama3-based models with BitNet quantization, trained on large token counts.

Llama3-8B-1.58-100B-tokens

Parameters: 8B

Training: 100 billion tokens

Model Type: Base

HuggingFace: View on HuggingFace

Downloading Models

Models can be downloaded directly from HuggingFace using the HuggingFace CLI:

Download GGUF Model

huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
  --local-dir models/BitNet-b1.58-2B-4T

For .safetensors models, you'll need to convert them to GGUF format. See our Usage Guide for conversion instructions.

Model Selection Guide

Choose by Use Case

General Purpose Text Generation: BitNet-b1.58-2B-4T or BitNet-b1.58-3B
Conversational AI / Chat: Falcon3-Instruct models (1B, 3B, 7B, or 10B depending on requirements)
Research & Experimentation: Any model depending on research focus
Resource-Constrained Environments: Smaller models like Falcon3-1B-Instruct or BitNet-b1.58-2B-4T
Higher Quality Output: Larger models like Falcon3-7B-Instruct or Falcon3-10B-Instruct

Choose by Resource Constraints

Low Memory (4-8GB): Falcon3-1B-Instruct or BitNet-b1.58-2B-4T
Medium Memory (8-16GB): Falcon3-3B-Instruct or BitNet-b1.58-3B
High Memory (16GB+): Falcon3-7B-Instruct, Falcon3-10B-Instruct, or Llama3-8B-1.58

Model Format

BitNet models are distributed in GGUF format, which is optimized for efficient loading and inference. Some models are also available in .safetensors format, which can be converted to GGUF using the provided conversion utilities.

Quantization types available:

i2_s: 1-bit signed quantization (recommended)
tl1: Ternary-like quantization variant

Using Models

After downloading a model, you need to set it up before use:

Setup Model

python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s

Then run inference as shown in our Usage Guide.

Related Resources

Usage Guide - How to use models
Installation Guide - Setup instructions
Benchmark Guide - Compare model performance
Documentation - Complete API reference
Microsoft on HuggingFace - All BitNet models