Best Practices
Tips for getting the most out of BitNet and 1-bit LLMs
Choose the Right Model
Match model size to your hardware and task. Use smaller models (e.g. 1B–2B) for low memory or CPU; larger (7B–8B) for better quality when you have enough RAM/VRAM. See Models.
Set Context Size Appropriately
Use the smallest context (-c) that fits your prompts to save memory and speed. Increase only when you need long conversations or documents.
Adjust Temperature
Lower temperature (e.g. 0.3–0.5) for more deterministic, factual output; higher (0.7–1.0) for creative or varied text. See Usage.
Use GPU When Available
BitNet's CUDA kernels are much faster than CPU. Use a CUDA-capable GPU for production or heavy use. See Installation and Benchmark.
Warm Up Before Benchmarking
Run a few inference calls before measuring throughput so caches and GPU are warmed up. See Benchmark.