Minimal LLM implementation.

Built with Llama

Using a quantized Llama 3.2 with 1B parameters. This demo simply summarize random wikipedia articles. In order to meet the requirement of file size for git-lfs, the model Has been quantized using llama.cpp, reducing its size to ~700MB. The model was then deployed on Hugging Face Spaces through github action.
Despite its relatively small size the performance are quite poor since the model is running (for free) on 2 vCPU cores.
You can find the code for the API and quantization script on github as well as all the machinery for hugging face deployment.

Minimal LLM implementation.

Random Wikipedia Article

Article Content