ALIA's Tokenizer: Why Spanish, Catalan, Basque, and Galician Cost Almost Half the Tokens of Llama 3

What you don't see when using an LLM: it doesn't read your text, it reads tokens
When you ask ChatGPT, Claude, or your local LLM something, the model never sees the sentence you wrote. The first thing that happens is that a component called the tokenizer splits your text into pieces — tokens — and the model only receives numbers. All inference (and all cost) operates on those tokens, not on words.
The consequence, which almost nobody discusses publicly: two models can consume vastly different amounts of tokens to process exactly the same text. And models trained with English in mind are much less efficient with other languages.
A few weeks ago we published ALIA quantized to NVFP4 so anyone could run it on an NVIDIA DGX Spark. While researching that, we discovered something that deserves its own article: ALIA's tokenizer is radically different from Llama 3, Mistral, or GPT, and for Iberian languages it's between 1.7 and 2 times more efficient.
Since a paragraph won't convince anyone, we built a tool so you can see for yourself:
🌐 Try it live:
labs.montevive.ai/alia-tokenizer-comparison/— live comparison between ALIA, Llama 3, and Mistral tokenizers. Paste any text and watch in real time how each model chunks it. Runs 100% in your browser: no text leaves your device.
🎥 Video demo (3 min): — a visual tour of the demo with examples in Spanish, Catalan, Basque, and Galician.
The numbers, in a table
We took an administrative paragraph in each of the four Iberian languages and ran it through all three tokenizers. The results are striking:
| Language | ALIA | Llama 3 | Mistral | ALIA vs Llama 3 |
|---|---|---|---|---|
| Spanish (legal-administrative text) | 31 tokens | 53 | 67 | 1.71× more efficient |
| Catalan (institutional text) | 34 tokens | 62 | 78 | 1.82× more efficient |
| Basque (public services) | 42 tokens | 81 | 102 | 1.93× more efficient |
| Galician (local administration) | 38 tokens | 65 | 80 | 1.71× more efficient |
| English (equivalent text) | 47 tokens | 41 | 50 | 0.87× (Llama 3 wins) |
The difference is most noticeable in administrative and territory-specific vocabulary — exactly the use case that matters most to a Spanish public administration:
Generalitat→ 1 token in ALIA, 3 tokens in Llama 3 (Gen,eral,itat)ayuntamiento→ 1 token in ALIA, 3 tokens in Llama 3 (ay,untami,ento)Cataluña→ 1 token in ALIA, 3 tokens in Llama 3Euskadi→ 1 token in ALIA, 4 tokens in Llama 3Xunta→ 1 token in ALIA, 2 tokens in Llama 3concejalía→ 1 token in ALIA, 4 tokens in Llama 3
ALIA recognizes these pieces as atomic units. Llama 3 breaks them into meaningless fragments.
Why does this matter? Four concrete reasons
1. Cost
The price of any LLM API is measured in tokens, not words. If your RAG processes 10 million words per month in Spanish, with an Anglo-Saxon tokenizer you pay 70-90% more than you would with a tokenizer like ALIA's. The difference compounds: prompt + retrieved context + response, everything counts.
For a RAG system over the BOE (Spanish Official Gazette), municipal files, or healthcare documentation — where a single prompt can carry several thousand words of context — the annual bill changes by an order of magnitude.
2. Speed
An LLM's generation speed is measured in tokens per second, not words per second. If your model generates at 50 tok/s, and your tokenizer needs 1.7× more tokens for the same paragraph, your user perceives 1.7× less speed. For a conversational assistant in Spanish, that factor is the difference between "instant" and "noticeable latency."
3. Context window
All LLMs have a context limit measured in tokens. With an efficient tokenizer, the same window fits almost twice as much Spanish text:
- Llama 3 with 8,192 tokens of context ≈ ~6,000 words of Spanish
- ALIA with 8,192 tokens of context ≈ ~10,000 words of Spanish
For RAG over long documents (court rulings, administrative files, medical records) that's the difference between "I have to split the document into five pieces" and "it fits whole."
4. Quality
This is the most subtle but perhaps most important consequence. When a model chunks ayuntamiento into ay + untami + ento, its attention mechanism has to reconstruct the meaning from individually meaningless fragments. Each token "sees" the rest of the sentence worse, and the model spends capacity re-assembling words before it can even begin reasoning about them.
When ALIA sees ayuntamiento as a single token, that token already carries the complete semantics of the word from the start. The quality of Spanish responses improves — not because the model is better in abstract, but because the input is cleaner.
Why ALIA is like this: a custom-built vocabulary from scratch
Most current open-source models inherit their tokenizer from Llama (128,000 tokens, tiktoken-style BPE) or from Mistral (32,000 tokens, SentencePiece). Those vocabularies were trained on English-dominated corpora. Words in Spanish, Catalan, Basque, or Galician weren't sufficiently represented to achieve efficient encoding, so they appear fragmented.
ALIA does something different: the Barcelona Supercomputing Center team trained a SentencePiece tokenizer from scratch on a multilingual Iberian corpus, with a vocabulary of 256,000 tokens — double that of Llama 3 and eight times that of Mistral. That extra size is spent on pieces useful for Iberian languages: institution names, legal-administrative vocabulary, Catalan verbal morphology, Basque suffixes, Galician roots.
The result is what the demo shows: each "natural" word from Iberian administration is encoded as a single piece, not as a puzzle of fragments.
What this means for your project
If you work with text in Spanish, Catalan, Basque, or Galician — and especially if you work with Spanish administrative, legal, or sector-specific text — using an LLM with an Anglo-Saxon tokenizer is a silent efficiency loss that shows up in billing, latency, and response quality.
ALIA in NVFP4 gives you both things at once:
- The Iberian tokenizer: ~1.7× fewer tokens for the same text in Iberian languages
- A 40B parameter model trained on data representative of Iberian culture, now executable and adaptable on a €4,000 NVIDIA DGX Spark thanks to NVFP4 quantization
If you want to understand how to deploy and adapt ALIA to your organization's domain, we cover it in this other article.
Try it right now
The demo is at labs.montevive.ai/alia-tokenizer-comparison/. Paste a paragraph from your local official gazette, a Constitutional Court ruling, a company circular. You'll see all three tokenizers working in parallel, with token counts, efficiency ratios, and colored chips for each piece.
And it all runs 100% in your browser, with transformers.js: no text you paste leaves your device, not even for tokenization. Consistent with how we build things at Montevive.
More demos in the lab
This is the second demo at labs.montevive.ai. The first, also local-first and private, detects personally identifiable information (PII) in the text you paste, without servers and without sending data anywhere:
- Privacy Filter local — PII detector in the browser, based on a multilingual NER model running with WebGPU
- ALIA Tokenizer Comparison — the one from this article
Same principle in both cases: what can be done locally without losing quality, should be done locally.
About ALIA: collaboration between BSC and Spanish research centers
ALIA is coordinated by the Barcelona Supercomputing Center (BSC-CNS), under the leadership of the Secretary of State for Digitalization and Artificial Intelligence (SEDIA) and driven by the Government of Spain.
The project builds upon ILENIA (Impulse of Languages in Artificial Intelligence), a consortium that integrates research centers specialized in language technologies for each co-official language:
Participating centers
- BSC-CNS (Barcelona Supercomputing Center): general coordinator and responsible for the AINA project for Catalan
- HiTZ (Basque Center for Language Technology) - University of the Basque Country: GAITU project for Basque
- CiTIUS (Research Center for Intelligent Technologies) and ILG (Galician Language Institute) - University of Santiago de Compostela: NÓS project for Galician
- CeAtic (Center for Advanced Studies in ICT) - University of Jaén: ALIA technology transfer in Andalusia
- CENID (Digital Intelligence Center) - University of Alicante: VIVES project for Valencian
Funding: Recovery, Transformation and Resilience Plan (NextGeneration EU), EuroHPC Joint Undertaking (European supercomputing consortium) and regional governments.
ALIA represents the convergence of these previous efforts into a unique, multilingual and sovereign infrastructure, trained on MareNostrum 5 (BSC, Barcelona).
Want to adapt ALIA to your organization?
At Montevive we help public administrations, regulated companies, and cooperatives deploy generative AI within their own infrastructure, keeping data in-house:
- Domain fine-tuning on ALIA, adapted to your terminology (legal, healthcare, sector-specific)
- Deployment on NVIDIA DGX Spark, Blackwell servers, or private cloud
- Integration with your existing systems: APIs, RAG, specialized agents
📧 Contact: info@montevive.ai
🌐 More information: montevive.ai
ALIA belongs to everyone. And its tokenizer, moreover, is for us.

