Transformers V5 and GGUF Fine-Tuning

Introduction

When I saw from locallama that a preview version called 5.0.0rc0 of the Transformers Library had been released, I didn't care much at first, but as I read the comments, within minutes I jumped up and immediately vibe coded a script and tried it; you could now work with GGUF in the Transformers Library. This post is more of an expression of the feelings and experiments I experienced in 20 minutes today than a serious blog post

GGUF Format and Fine-Tuning

I had previously made various posts about the GGUF Format, but to summarize, the GGUF format is an indispensable life knowledge for those who are not rich in graphics cards, and it is a format that allows us to run large language models locally by easily sharing them in the type we want (with rpc between computers, or with RAM&VRAM within the computer).

The GGUF format, before Transformers 5.0.0rc0, was largely separate from the main library, and we couldn't edit gguf files with this library. For example, a GGUF fine-tuning, although not possible recently, was once done with its own program. That's why when I saw the comments "Can we now fine-tune GGUF format with transformers library," I went through one of the shocks of my life.

So I opened Cursor IDE, pasted the documents^[1,2], and had it vibe code. Interestingly, the answer to this question is; YES.

Vibe Coding Phase

Many of us know the term Vibe Coding today, the name given to the phase of having Large Language Models write code. Looking at the documents, I asked Gemini 3 Pro to write a simple "fictional" finetuning code, because I didn't expect it to work. Indeed, it didn't work at first; first I had a tokenizer problem;

`AutoTokenizer.from_pretrained(..., gguf_file=...)` line gave an UnboundLocalError. So I loaded some files from the safetensor version (tokenizer.model file). According to Gemini 3 Pro, this is a 5.0.0rc0 bug and unusual.

Then I got a Chat Template error, because I got the GGUF file from unsloth (ollama format) and it was in Go language, not Jinja2 (HuggingFace requires this). So I created a new template.txt file.

After easily solving these 2 problems, the fine-tuning code I made for testing worked interestingly;

"""

Loading GGUF model: ./gemma-3-270m-it-F16.gguf

Chat template missing, loading from template.txt...

Converting and de-quantizing GGUF tensors...: 100%|█| 236/236 [00:00<00:00, 1317

Loading weights: 100%|█| 236/236 [00:00<00:00, 12221.04it/s, Materializing param

Map: 100%|██████████████████████████| 566/566 [00:00<00:00, 12962.98 examples/s]

Starting training (GGUF -> Dequantized -> Finetune)...

0%| | 0/142 [00:00

warnings.warn(warn_msg)

{'loss': '6.409', 'grad_norm': '218', 'learning_rate': '2e-05', 'epoch': '0.007067'}

...

"""

at that moment I experienced great happiness, but when I tried the output safetensor file (save_pretrained saves a safetensor) because I didn't adjust the settings correctly, I saw a model that didn't learn well, but learned to some extent.

Conclusion

Being able to work directly with GGUF models made me happy in an interesting and indescribable way. The Transformers 5.0.0rc0 development is a great development in terms of standards and ease, in my opinion.

References

1. HuggingFace. "Transformers v5." HuggingFace Blog, 2025. https://huggingface.co/blog/transformers-v5
2. HuggingFace. "GGUF Documentation." HuggingFace Transformers Documentation, 2025. https://huggingface.co/docs/transformers/en/gguf

Introduction

GGUF Format and Fine-Tuning

Vibe Coding Phase

Conclusion

Tags

References