Introduction
When I saw from locallama that a preview version called 5.0.0rc0 of the Transformers Library had been released, I didn't care much at first, but as I read the comments, within minutes I jumped up and immediately vibe coded a script and tried it; you could now work with GGUF in the Transformers Library. This post is more of an expression of the feelings and experiments I experienced in 20 minutes today than a serious blog post
GGUF Format and Fine-Tuning
I had previously made various posts about the GGUF Format, but to summarize, the GGUF format is an indispensable life knowledge for those who are not rich in graphics cards, and it is a format that allows us to run large language models locally by easily sharing them in the type we want (with rpc between computers, or with RAM&VRAM within the computer).
The GGUF format, before Transformers 5.0.0rc0, was largely separate from the main library, and we couldn't edit gguf files with this library. For example, a GGUF fine-tuning, although not possible recently, was once done with its own program. That's why when I saw the comments "Can we now fine-tune GGUF format with transformers library," I went through one of the shocks of my life.
So I opened Cursor IDE, pasted the documents^[1,2], and had it vibe code. Interestingly, the answer to this question is; YES.
Vibe Coding Phase
Many of us know the term Vibe Coding today, the name given to the phase of having Large Language Models write code. Looking at the documents, I asked Gemini 3 Pro to write a simple "fictional" finetuning code, because I didn't expect it to work. Indeed, it didn't work at first; first I had a tokenizer problem;
`AutoTokenizer.from_pretrained(..., gguf_file=...)` line gave an UnboundLocalError. So I loaded some files from the safetensor version (tokenizer.model file). According to Gemini 3 Pro, this is a 5.0.0rc0 bug and unusual.
Then I got a Chat Template error, because I got the GGUF file from unsloth (ollama format) and it was in Go language, not Jinja2 (HuggingFace requires this). So I created a new template.txt file.
After easily solving these 2 problems, the fine-tuning code I made for testing worked interestingly;
"""
Loading GGUF model: ./gemma-3-270m-it-F16.gguf
Chat template missing, loading from template.txt...
Converting and de-quantizing GGUF tensors...: 100%|█| 236/236 [00:00<00:00, 1317
Loading weights: 100%|█| 236/236 [00:00<00:00, 12221.04it/s, Materializing param
Map: 100%|██████████████████████████| 566/566 [00:00<00:00, 12962.98 examples/s]
Starting training (GGUF -> Dequantized -> Finetune)...
0%| | 0/142 [00:00, ?it/s]/opt/miniconda3/envs/hf-v5_rc/lib/python3.11/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
warnings.warn(warn_msg)
{'loss': '6.409', 'grad_norm': '218', 'learning_rate': '2e-05', 'epoch': '0.007067'}
...
"""
at that moment I experienced great happiness, but when I tried the output safetensor file (save_pretrained saves a safetensor) because I didn't adjust the settings correctly, I saw a model that didn't learn well, but learned to some extent.
Conclusion
Being able to work directly with GGUF models made me happy in an interesting and indescribable way. The Transformers 5.0.0rc0 development is a great development in terms of standards and ease, in my opinion.
References
- 1. HuggingFace. "Transformers v5." HuggingFace Blog, 2025. https://huggingface.co/blog/transformers-v5
- 2. HuggingFace. "GGUF Documentation." HuggingFace Transformers Documentation, 2025. https://huggingface.co/docs/transformers/en/gguf