GLiNER – GDPR-Compliant AI through Local PII Detection
Open-source NER model for local PII detection. Enables two privacy strategies: permanent anonymization or temporary pseudonymization with re-personalization after AI processing.
GLiNER (Generalist and Lightweight Named Entity Recognition) is an open-source model for detecting named entities that runs entirely locally. It forms the foundation for GDPR-compliant AI applications where personally identifiable information (PII) must be protected before being processed by external LLMs.
The Problem
When documents containing personal data are sent to LLMs such as GPT-4 or Claude, that data leaves your own server. This is problematic from a data protection perspective and may violate GDPR. GLiNER solves this problem through local PII detection before the API call.
Detected Entities (PII Types)
- Person Names: First and last names, titles
- Addresses: Street, house number, zip code, city, country
- Contact Details: Phone, email, fax
- Financial Data: IBAN, BIC, credit card numbers
- Identifiers: ID card, passport, social security number
- Health Data: Insurance numbers, diagnoses
- Company Data: Company names, commercial register numbers
- Digital IDs: Usernames, IP addresses
Two Strategies: Anonymization vs. Pseudonymization
Depending on the use case, we deploy GLiNER in two different architectures:
Strategy A: Permanent Anonymization
PII is removed and NOT restored
Input → GLiNER → Remove PII → LLM → Anonymous Output
Suitable for:
- Chatbots and customer service
- General questions and research
- Document analysis without personalized response
- Scenarios where the output does not need to contain names
Example:
Input: "Mr. Smith from Munich has a question about his invoice #12345"
To LLM: "A customer has a question about their invoice"
LLM response: "For invoice inquiries, I recommend the following steps..."
Advantages
- Maximum security – PII no longer exists
- No mapping table required
- Simple architecture
- No risk of data leaks
Limitations
- Output cannot be personalized
- Not suitable for letters/emails
Strategy B: Pseudonymization with Re-Personalization
PII is replaced, processed and restored
Input → GLiNER → Pseudonymize → LLM → Re-Personalize → Personalized Output
Suitable for:
- Automated email generation
- Personalized letters and documents
- Contract templates with customer data
- Support replies with direct salutation
Example:
Input: "Write a reminder email to John Doe, 123 Main St, 10001 New York. Outstanding amount: $1,250.00"
To LLM: "Write a reminder email to [PERSON_1], [ADDRESS_1]. Outstanding amount: [AMOUNT_1]"
LLM generates: "Dear [PERSON_1], we would like to kindly remind you of the outstanding invoice for [AMOUNT_1]..."
Re-personalized: "Dear John Doe, we would like to kindly remind you of the outstanding invoice for $1,250.00..."
Advantages
- Personalized outputs possible
- PII never leaves your own server
- LLM only sees placeholders
- Full automation possible
Things to Consider
- Mapping table must be stored securely
- Slightly more complex architecture
- Keep mapping only for session duration
Technical Implementation: Re-Personalization
The pseudonymization workflow consists of three phases:
Phase 1: PII Detection and Pseudonymization
// GLiNER detects all PII in the text
const detectedEntities = gliner.analyze(inputText);
// Result:
[
{ text: "John Doe", type: "PERSON", start: 32, end: 40 },
{ text: "123 Main St", type: "ADDRESS", start: 42, end: 53 },
{ text: "10001 New York", type: "ADDRESS", start: 55, end: 69 },
{ text: "$1,250.00", type: "MONEY", start: 89, end: 98 }
]
// Pseudonymization: replace PII with placeholders
const mapping = new Map();
let pseudonymizedText = inputText;
detectedEntities.forEach((entity, index) => {
const placeholder = `[${entity.type}_${index + 1}]`;
mapping.set(placeholder, entity.text);
pseudonymizedText = pseudonymizedText.replace(entity.text, placeholder);
});
// Mapping (in RAM only, never persisted!):
// [PERSON_1] → "John Doe"
// [ADDRESS_1] → "123 Main St"
// [ADDRESS_2] → "10001 New York"
// [MONEY_1] → "$1,250.00"
Phase 2: LLM Processing
// The pseudonymized text is sent to the LLM
const llmResponse = await openai.chat.completions.create({
model: "gpt-4",
messages: [{
role: "system",
content: "You write professional business letters. Placeholders like [PERSON_1] will be replaced later – use them exactly as-is in the text."
}, {
role: "user",
content: pseudonymizedText
}]
});
// LLM sees ONLY:
// "Write a reminder email to [PERSON_1], [ADDRESS_1], [ADDRESS_2].
// Outstanding amount: [MONEY_1]"
// LLM responds with placeholders:
// "Dear [PERSON_1], we would like to kindly remind you..."
Phase 3: Re-Personalization
// Replace all placeholders with the original PII
let finalOutput = llmResponse.choices[0].message.content;
mapping.forEach((originalValue, placeholder) => {
finalOutput = finalOutput.replaceAll(placeholder, originalValue);
});
// Delete mapping immediately – never persist!
mapping.clear();
// Result: Fully personalized text
// "Dear John Doe, we would like to kindly remind you..."
Security Architecture
Critical Security Rules for Re-Personalization
- Mapping in RAM only: The placeholder → PII mapping is NEVER stored in a database or file
- Session-bound: The mapping exists only for the duration of the request and is immediately deleted afterwards
- No logging: Neither the input, the mapping nor the final output are logged
- Encrypted transmission: All communication via HTTPS/TLS
- Isolated processing: Each request has its own mapping – no cross-contamination
Data Flow Diagram
┌─────────────────────────────────────────────────────────────────────────────┐
│ YOUR SERVER (Secure Zone) │
│ ┌──────────┐ ┌─────────┐ ┌─────────────┐ ┌──────────────────────┐ │
│ │ User │───▶│ GLiNER │───▶│ Mapping │ │ Re-Personalization │ │
│ │ Input │ │ (NER) │ │ (RAM only!) │ │ │ │
│ └──────────┘ └────┬────┘ └──────┬──────┘ └──────────▲───────────┘ │
│ │ │ │ │
│ ▼ │ │ │
│ ┌────────────────┐ │ │ │
│ │ Pseudonymized │ │ │ │
│ │ [PERSON_1]... │────────┼──────────────────────┘ │
│ └───────┬────────┘ │ │
└──────────────────────┼─────────────────┼────────────────────────────────────┘
│ │
▼ │ (Mapping stays internal!)
┌──────────────────────────┐ │
│ EXTERNAL API │ │
│ ┌────────────────────┐ │ │
│ │ LLM (GPT-4) │ │ │
│ │ │ │ │
│ │ Sees ONLY: │ │ │
│ │ "[PERSON_1] has │ │ │
│ │ a question..." │ │ │
│ └────────────────────┘ │ │
└──────────────────────────┘ │
│ │
│ Response with │
│ placeholders │
▼ │
┌────────────────┐ │
│ "Dear │─────────┘
│ [PERSON_1]" │ Mapping is applied
└────────────────┘
When to Use Which Strategy?
| Use Case | Recommended Strategy | Reason |
|---|---|---|
| Chatbot / Customer Service | A: Anonymization | Responses do not need to be personalized |
| Document Analysis | A: Anonymization | Summaries do not require names |
| Email Generation | B: Re-Personalization | Emails must address the recipient directly |
| Letter/Contract Templates | B: Re-Personalization | Documents require real names and addresses |
| Internal Knowledge Search | A: Anonymization | Factual knowledge, no personalization needed |
| Personalized Reports | B: Re-Personalization | Reports for specific persons/companies |
Why GLiNER?
- 100% local: No data leaves the server during NER analysis
- Open Source: Full transparency and auditability (MIT license)
- Lightweight: Runs on CPU, no expensive GPU cluster needed
- Multilingual: Supports German, English and many other languages
- Zero-Shot NER: Detects new entity types without retraining
- Fast: Processes documents in milliseconds
GDPR-Compliant AI for Your Business
We implement GLiNER-based privacy layers for your AI applications. Whether chat, document processing or automated correspondence – your data remains protected.