A few months ago, at Montevive.AI we faced a problem you probably know about: one of our clients needed to analyze millions of records to comply with GDPR, but the available tools took hours to process their data. Searching for different options, we came across names-datasetan excellent Python library with an impressive database of real names, but it was too resource-intensive and slow for business processes. That’s when we decided to port it to Go to make it radically more efficient.
Today we release Go Name Detectorour optimized migration that detects personal information in your data 10 times faster than the original Python version. And yes, it is completely open source because we believe that data privacy is a right that everyone should be able to protect efficiently.
The real problem that nobody wants to admit
Most companies are sitting on a ticking time bomb in terms of data privacy. It’s not that they don’t want to comply with regulations, it’s that the current tools make the process incredibly painful.
Imagine having to review logs of millions of transactions to make sure no customer names are exposed. Or worse, discovering after an audit that your analytics system has been processing personal information without anonymization for months. GDPR fines can reach 4% of your global annual turnover. That’s not a mistake you can afford.
The technical problem is fascinating: detecting names is not as simple as looking up words in a list. “Pink” can be a name or a color. “Santiago” can be a person or a city. And when you add cultural complexity (Spanish names with two surnames, Asian compound names, Arabic transliterations), it gets exponentially more complicated.
The solution we built (and why it is different)
Go Name Detector was born out of an ambitious migration. We took the popular Python library names-dataset
, which is excellent but slow and heavy, and rebuilt it from scratch in Go. But we didn’t stop there. We completely reimagined it.
📊 Performance metrics that speak for themselves:
Metrics | Python Original | Go Name Detector | Enhancement |
---|---|---|---|
Detection rate | 50-100ms | 3-9ms | 10-20x faster |
Memory usage | 3.2 GB | 500 MB | 6x less |
Charging time | 30-60 seconds | 4.3 seconds | 14x faster |
Batch processing | ~100 names/sec | 10,000+ names/sec | 100x faster |
Names in database | 727,556 | 727,556 | Same coverage |
Surnames in database | 983,826 | 983,826 | Same coverage |
Countries supported | 105 | 105 | Global coverage |
The result is a tool that processes more than 10,000 names per second. To put that in perspective, that means you can parse a million records in less than two minutes. The original Python version would take more than three hours to do the same thing.
But speed is only part of the story. What really excites us is the universal algorithm we developed. Instead of using rigid rules like “first word = first name, second word = last name” (which fail spectacularly with names like “María del Carmen García López”), our system intelligently tests all possible combinations and calculates the probability based on real data from 533 million people from 105 countries.
How it works in the real world
Suppose your system records this entry: “User Jose Manuel Robles Hermoso made a purchase”. A traditional system might identify only “Jose” as a name, or worse, it might not detect anything if it is waiting for a specific format.
Go Name Detector analyzes all possibilities:
- Is “Jose Manuel” the first name and “Robles Hermoso” the last name?
- Or is “Jose” the first name and “Manuel Robles Hermoso” the last names?
The algorithm evaluates each combination against our database of 727,556 first names and 983,826 last names, considering the popularity of each name in different countries and the cultural consistency of the set.
In this case, it would correctly detect that “Jose Manuel” are the first names (very common in Spain to have two first names) and “Robles Hermoso” are the last names (the Spanish pattern of paternal and maternal surname), with a confidence of 92.1%. All this in less than 9 milliseconds.
Why we decided to make it open source
At Montevive.AI we have a clear philosophy about security and privacy. We have migrated and optimized the original Python library to Go to gain exceptional performance, and we decided to share it with the community because we believe that fundamental tools for data protection should be accessible to everyone.
🎯 Benefits of our open source approach:
- Full transparency: Companies know exactly how we protect their data
- Auditable: Anyone can review and validate our algorithms.
- Adaptable: Customizable for industry-specific needs
- Community: Continuous improvements from developers around the world
- No vendor lock-in: Your company maintains total control
- Trust: Showing the code generates more credibility than any certification.
Plus, there’s something powerful about showcasing your work. When a potential client can see exactly how we solve complex problems, how we optimize for performance, and how we think about privacy, it generates a level of trust that no marketing brochure could ever achieve.
What this means for different teams
For Developers:
If you are a developer, Go Name Detector integrates into your pipeline with a single line of code. You don’t need to configure anything, there are no external files to download, no complicated dependencies. Just go get
and you’re ready to go. The library includes all embedded and optimized data in Protocol Buffers format.
Install it directly from Go Name Detector on GitHub:
go
// Instalación instantánea
go get github.com/montevive/go-name-detector@latest
// Uso inmediato
d, _ := detector.NewDefault()
result := d.DetectPII([]string{"Juan", "Pérez"})
For Data Scientists:
If you work in data science or analytics, imagine being able to clean your datasets in seconds instead of hours. Before training that machine learning model, you can guarantee that there is no personal information hidden in your data. Before sharing that dataset with a partner, you can automatically verify that it complies with privacy regulations.
For Security and Compliance Teams:
If you are responsible for compliance or security, this tool gives you superpowers. You can audit entire databases in minutes, set up real-time monitors that detect PII in logs, or validate that your anonymization processes really work. And because it uses only 500MB of RAM (compared to the 3.2GB of the Python version), you can run it anywhere, from your laptop to a cloud container.
Features that make the difference
Feature | Description | Actual Impact |
---|---|---|
Universal Algorithm | No hardcoded rules, pure ML | Works with ALL cultural patterns |
Massive Dataset | 533M real people | Accurate and reliable detection |
Protocol Buffers | Optimized binary format | 6x less memory, ultra-fast loading |
Intelligent Scoring | Multiple confidence factors | Reduces false positives to a minimum |
CLI Included | Ready to use tool | Batch analysis without programming |
Embedded | All included in the library | Zero configuration, zero dependencies |
Cultural Support | 105 countries, all patterns | Spanish, Asian, Arabic, etc. names. |
The future we are building
Go Name Detector is just the first step in our vision of a complete ecosystem of AI tools for privacy and security. We are working on:
- Expanded detectors: Addresses, telephone numbers, identification numbers, e-mails
- Intelligent anonymization: Preserves usability while protecting privacy
- Real-time Dashboards: Visualize your privacy posture instantly
- Predictive analytics: AI that anticipates risks before they occur
- Native integration: Plugins for all major platforms
But beyond specific tools, we are betting on a cultural shift in how the industry thinks about privacy. It should not be a costly obstacle to compliance, but a competitive advantage. Companies that can process and analyze data while respecting privacy at speed and scale will be the ones that define the future.
Join us
Go Name Detector is available right now on GitHub. You can download it, test it, improve it, or just study it to understand how we approach these challenges. If you find a bug, report it. If you have an idea to improve it, send a pull request. If it helps you in your work, give it a star so others can discover it.
🚀 Get started in 3 simple steps:
- Install:
go get github.com/montevive/go-name-detector@latest
- Matters:
import "github.com/montevive/go-name-detector/pkg/detector"
- Use:
d.DetectPII(palabras)
– It’s that simple!
And if your company needs to go further, if you need customized solutions or enterprise support, that’s where Montevive.AI comes in as a partner. We’ve helped companies of all sizes transform their data privacy practices, from startups needing to comply with their first GDPR audit to corporations processing billions of records daily.
Data privacy is not only a legal obligation, it is an ethical responsibility and a business opportunity. With the right tools, protecting your users’ personal information doesn’t have to be time-consuming, expensive or complicated.
Visit github.com/montevive/go-name-detector to get started today, or contact us at montevive.ai to find out how we can help you take your privacy strategy to the next level.
Because at Montevive.AI we don’t just develop AI. We develop AI you can trust.