BitNet ships custom kernels optimized for GPU execution. While CPU inference is possible, most users targeting interactive chat or batch APIs will want a CUDA-capable NVIDIA GPU. This page collects what typically matters before you follow the main installation guide.

Driver and CUDA stack

Install a recent NVIDIA driver that matches your GPU generation. CUDA toolkit version compatibility depends on the BitNet release and bundled llama.cpp build—always read the upstream README for pinned versions. Mismatched driver/toolkit pairs are a frequent source of build failures listed in troubleshooting.

Build considerations

  • Use the same compiler toolchain recommended in the repo (GCC/Clang on Linux; see Windows for MSVC + clang paths).
  • Ensure CMake can find CUDA when GPU support is enabled.
  • After a successful build, validate with a short inference run before scaling context length.

Performance expectations

1-bit weights reduce memory traffic; observed tokens/sec still depend on batch size, context length, and kernel version. Use benchmarks on your hardware rather than comparing raw FLOP estimates alone.

Related