Tutorials | BitNet - 1-bit LLM Inference

1. Install BitNet

Follow the Installation guide: clone the repo, create a conda env, install dependencies, then run setup_env.py for your model. See Getting Started for a quick path.

2. Download a Model

Use the Hugging Face CLI to download a GGUF model, e.g. BitNet-b1.58-2B-4T. Full commands are in Installation and Models.

3. Run Your First Inference

Use run_inference.py -m <path-to-gguf> -p "Your prompt". Add -cnv for chat mode. Details in Usage.

4. Run Benchmarks

Use e2e_benchmark.py to measure throughput and latency. See Benchmark.

More Guides

Getting Started
Installation
Usage
Best Practices
Troubleshooting