Port of Facebook's LLaMA model in C/C++

The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook * Plain C/C++ implementation without dependencies * Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks * AVX, AVX2 and AVX512 support for x86 architectures * Mixed F16 / F32 precision * 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support * CUDA, Metal and OpenCL GPU backend support The original implementation of llama.cpp was hacked in an evening. Since then, the project has improved significantly thanks to many contributions. This project is mainly for educational purposes and serves as the main playground for developing new features for the ggml library.
RPM
llama-cpp-b4094-10.fc42.x86_64.rpm
Summary
Port of Facebook's LLaMA model in C/C++
URL
https://github.com/ggerganov/llama.cpp
Group
Unspecified
License
MIT AND Apache-2.0 AND LicenseRef-Fedora-Public-Domain
Source
llama-cpp-b4094-10.fc42.src.rpm
Checksum
b5ab621d5da18c11046bf35b126d08c89052c320eb294e734b323c676111b586
Signing Signature
RSA/SHA512, Sun 05 Apr 2026 07:21:05 PM AEST, Key ID d760880122ab8392
Build Date
2025/01/29 23:21:12
Requires
Provides
libggml-base.so
libggml.so
libllama.so
llama-cpp = b4094-10.fc42
llama-cpp(x86-64) = b4094-10.fc42