On the SafeTensor File Format

Introduction

People who run Large Language Models locally often follow a philosophy rather than just a trend; after all, according to general opinion, when models like ChatGPT, Claude, Gemini (not Gemma) that are free or cheap are accessible online and can work much better than local ones, how can anyone find a logical reason to make serious investments to reach a similar level and/or quality?

There are many answers that can be given to this question, whether it's personal privacy, uncensorship, philosophy of having control, personalization, limitations, etc. Although these appear different from each other, they converge at one point: the file. All of these require having access to the file at their foundation. But there are many file formats, whether safetensor, GGUF, MLX, onnx, and the list goes on and on, which file format should be chosen, and how should one proceed? Downloading the original weights from their source, if this is possible, is the most ideal and optimal. This format in Large Language Models as of 2025 is predominantly (we can say 90%) the SafeTensor file format.

What is Safe (Secure) Tensor?

For the article to be easily understood, let's first talk about Tensors. Although this may seem like very obvious information to some individuals, as someone like me who last saw Mathematics in high school and where matrices weren't even properly discussed, meaning never exposed to these terms, they are actually completely unknown terms.

Tensor can be thought of as a data container that holds numbers. This container can be in different dimensions, for example Scalar creates a 0-dimensional, Vector 1-dimensional, Matrix 2-dimensional tensor.

To understand the concept of 3-dimensional tensor, we can think of a photograph, while a black-and-white photograph is a 2D tensor, a color picture creates a 3D tensor because it consists of a series of 2D tensors representing red, green, and blue pixels.

Since Large Language Models (and many other artificial intelligence models) consist of numbers indicating weights, they specify a particular tensor and are stored as tensors.

SafeTensor files are, to summarize briefly, a file format that contains only tensor files together, meaning they cannot contain executable code. Due to the dangerous problem in Python called "Pickle" (Pickle files can contain small code snippets alongside data under the name of "how this data should be combined". A malicious person can hide harmful code through this method and the code runs automatically when the file is loaded) and the weakness of .bin files in this regard, it gained popularity and spread, becoming a standard.

It has a quite simple architecture, the first 8 bytes of the file tell us the length of the contents section of the safetensor file, the JSON (contents) section contains a definition card for each tensor, this definition card contains dtype (data type, F16 (16-bit decimal number) can be given as an example), shape (tensor's shape), offsets: [BEGIN, END] (tensor's start and end point) values. The remaining part contains the actual data and weights.

Practical Importance of SafeTensor Format

SafeTensor file, in addition to (and therefore) the technical explanation in the previous section, has become an industry standard, starting to appear as the base format of every model.

This actually shows us the first reason why we should prefer the safetensor file over other types; The model being presented as a safetensor file or multiple safetensor files in its original source.

Despite this, the most important reason is not this, after all gemma 3 was initially released in .task file format, but no one could use it properly until the safetensor format was also published, this situation can be thought of as a preview of the importance of the next section and the second reason.

After all, if we want to tinker with a Large Language Model, like any entity being tinkered with, we need to have an idea, knowledge, and foundation. Safetensor, as a standard, directly provides us with these, and therefore having weight files in safetensor format will keep our options on how we can use the model in the future very clear and wide. This leads us to the second reason;

Compatibility Importance of SafeTensor Format

The second and most important feature of the SafeTensor file format is that it has been determined as a standard by the community in machine learning and has a wide compatibility range.

SafeTensor file is a format that can be easily manipulated on any hardware, since it is standard, support, methods, and libraries are extensive. Formats like GGUF and MLX are often behind safetensor and some important machine learning tasks cannot be performed completely and meaningfully (like Pre-training and Fine-tuning beyond a certain point). Let's continue with these two formats;

A GGUF file is not something obtained through pretraining today, it is the result of safetensor being converted to gguf format with llama.cpp. Effective fine-tuning cannot be done in GGUF format and although this feature was removed from llama.cpp at some point, it is being tried to be brought back recently.

MLX file is similarly created as a result of converting the safetensor file to an MLX compatible Safetensor file, fine-tuning and similar operations can be performed on this converted file, but the diversity and compatibility like safetensor cannot be mentioned.

Hardware and Philosophical Importance of SafeTensor Format

Safetensor format, unlike the 'community has accepted this' view mentioned in previous sections, also carries a hardware meaning; Safetensor file, unlike GGUF and MLX, allows us to organize weights according to our needs and enables us to access various quantization levels as a result of transformation/organization.

Philosophically here, we see a direct relationship with the principles mentioned in the first section such as "ownership", "control", "privacy". By owning the original of the file, it brings it to a quantization level we can use, when conditions change, with a simple re-quantization work, we can use it as much as we need in the way we want, we can apply techniques like personalization (fine-tuning) mentioned in previous sections and follow the philosophy we want freely without depending on someone else (like someone converting files to GGUF format and publishing them).

Conclusion

Safetensor format gives us the principles of freedom and security together, which are usually seen as opposite, but they usually come in a high dimension like BF16/FP16, which can create a problem for most end users.

Therefore, when our storage space is not a problem, if we are serious about this work, they should be preferred due to the flexibility they offer and being original raw files, after all, with quantization we can achieve the realistic size we want and can run, we can adapt according to our needs as conditions change, we can live the philosophy we follow correctly and properly.