GLiNER – GDPR-Compliant AI through Local PII Detection

Open-source NER model for local PII detection. Enables two privacy strategies: permanent anonymization or temporary pseudonymization with re-personalization after AI processing.

Category:Privacy & Security Tools

GLiNER (Generalist and Lightweight Named Entity Recognition) is an open-source model for detecting named entities that runs entirely locally. It forms the foundation for GDPR-compliant AI applications where personally identifiable information (PII) must be protected before being processed by external LLMs.

The Problem

When documents containing personal data are sent to LLMs such as GPT-4 or Claude, that data leaves your own server. This is problematic from a data protection perspective and may violate GDPR. GLiNER solves this problem through local PII detection before the API call.

Detected Entities (PII Types)

Person Names: First and last names, titles
Addresses: Street, house number, zip code, city, country
Contact Details: Phone, email, fax
Financial Data: IBAN, BIC, credit card numbers

Identifiers: ID card, passport, social security number
Health Data: Insurance numbers, diagnoses
Company Data: Company names, commercial register numbers
Digital IDs: Usernames, IP addresses

Two Strategies: Anonymization vs. Pseudonymization

Depending on the use case, we deploy GLiNER in two different architectures:

Strategy A: Permanent Anonymization

PII is removed and NOT restored

Input → GLiNER → Remove PII → LLM → Anonymous Output

Suitable for:

Chatbots and customer service
General questions and research
Document analysis without personalized response
Scenarios where the output does not need to contain names

Example:

Input: "Mr. Smith from Munich has a question about his invoice #12345"

To LLM: "A customer has a question about their invoice"

LLM response: "For invoice inquiries, I recommend the following steps..."

Advantages

Maximum security – PII no longer exists
No mapping table required
Simple architecture
No risk of data leaks

Limitations

Output cannot be personalized
Not suitable for letters/emails

Strategy B: Pseudonymization with Re-Personalization

PII is replaced, processed and restored

Input → GLiNER → Pseudonymize → LLM → Re-Personalize → Personalized Output

Suitable for:

Automated email generation
Personalized letters and documents
Contract templates with customer data
Support replies with direct salutation

Example:

Input: "Write a reminder email to John Doe, 123 Main St, 10001 New York. Outstanding amount: $1,250.00"

To LLM: "Write a reminder email to [PERSON_1], [ADDRESS_1]. Outstanding amount: [AMOUNT_1]"

LLM generates: "Dear [PERSON_1], we would like to kindly remind you of the outstanding invoice for [AMOUNT_1]..."

Re-personalized: "Dear John Doe, we would like to kindly remind you of the outstanding invoice for $1,250.00..."

Advantages

Personalized outputs possible
PII never leaves your own server
LLM only sees placeholders
Full automation possible

Things to Consider

Mapping table must be stored securely
Slightly more complex architecture
Keep mapping only for session duration

Technical Implementation: Re-Personalization

The pseudonymization workflow consists of three phases:

Phase 1: PII Detection and Pseudonymization

// GLiNER detects all PII in the text
const detectedEntities = gliner.analyze(inputText);

// Result:
[
  { text: "John Doe", type: "PERSON", start: 32, end: 40 },
  { text: "123 Main St", type: "ADDRESS", start: 42, end: 53 },
  { text: "10001 New York", type: "ADDRESS", start: 55, end: 69 },
  { text: "$1,250.00", type: "MONEY", start: 89, end: 98 }
]

// Pseudonymization: replace PII with placeholders
const mapping = new Map();
let pseudonymizedText = inputText;

detectedEntities.forEach((entity, index) => {
  const placeholder = `[${entity.type}_${index + 1}]`;
  mapping.set(placeholder, entity.text);
  pseudonymizedText = pseudonymizedText.replace(entity.text, placeholder);
});

// Mapping (in RAM only, never persisted!):
// [PERSON_1] → "John Doe"
// [ADDRESS_1] → "123 Main St"
// [ADDRESS_2] → "10001 New York"
// [MONEY_1] → "$1,250.00"

Phase 2: LLM Processing

// The pseudonymized text is sent to the LLM
const llmResponse = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{
    role: "system",
    content: "You write professional business letters. Placeholders like [PERSON_1] will be replaced later – use them exactly as-is in the text."
  }, {
    role: "user",
    content: pseudonymizedText
  }]
});

// LLM sees ONLY:
// "Write a reminder email to [PERSON_1], [ADDRESS_1], [ADDRESS_2].
//  Outstanding amount: [MONEY_1]"

// LLM responds with placeholders:
// "Dear [PERSON_1], we would like to kindly remind you..."

Phase 3: Re-Personalization

// Replace all placeholders with the original PII
let finalOutput = llmResponse.choices[0].message.content;

mapping.forEach((originalValue, placeholder) => {
  finalOutput = finalOutput.replaceAll(placeholder, originalValue);
});

// Delete mapping immediately – never persist!
mapping.clear();

// Result: Fully personalized text
// "Dear John Doe, we would like to kindly remind you..."

Security Architecture

Critical Security Rules for Re-Personalization

Mapping in RAM only: The placeholder → PII mapping is NEVER stored in a database or file
Session-bound: The mapping exists only for the duration of the request and is immediately deleted afterwards
No logging: Neither the input, the mapping nor the final output are logged
Encrypted transmission: All communication via HTTPS/TLS
Isolated processing: Each request has its own mapping – no cross-contamination

Data Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                        YOUR SERVER (Secure Zone)                             │
│  ┌──────────┐    ┌─────────┐    ┌─────────────┐    ┌──────────────────────┐ │
│  │  User    │───▶│ GLiNER  │───▶│   Mapping   │    │   Re-Personalization │ │
│  │  Input   │    │  (NER)  │    │ (RAM only!) │    │                      │ │
│  └──────────┘    └────┬────┘    └──────┬──────┘    └──────────▲───────────┘ │
│                       │                │                      │             │
│                       ▼                │                      │             │
│              ┌────────────────┐        │                      │             │
│              │ Pseudonymized  │        │                      │             │
│              │ [PERSON_1]...  │────────┼──────────────────────┘             │
│              └───────┬────────┘        │                                    │
└──────────────────────┼─────────────────┼────────────────────────────────────┘
                       │                 │
                       ▼                 │ (Mapping stays internal!)
        ┌──────────────────────────┐     │
        │      EXTERNAL API        │     │
        │  ┌────────────────────┐  │     │
        │  │   LLM (GPT-4)      │  │     │
        │  │                    │  │     │
        │  │ Sees ONLY:         │  │     │
        │  │ "[PERSON_1] has    │  │     │
        │  │  a question..."    │  │     │
        │  └────────────────────┘  │     │
        └──────────────────────────┘     │
                       │                 │
                       │ Response with   │
                       │ placeholders    │
                       ▼                 │
              ┌────────────────┐         │
              │ "Dear          │─────────┘
              │  [PERSON_1]"   │  Mapping is applied
              └────────────────┘

When to Use Which Strategy?

Use Case	Recommended Strategy	Reason
Chatbot / Customer Service	A: Anonymization	Responses do not need to be personalized
Document Analysis	A: Anonymization	Summaries do not require names
Email Generation	B: Re-Personalization	Emails must address the recipient directly
Letter/Contract Templates	B: Re-Personalization	Documents require real names and addresses
Internal Knowledge Search	A: Anonymization	Factual knowledge, no personalization needed
Personalized Reports	B: Re-Personalization	Reports for specific persons/companies

Why GLiNER?

100% local: No data leaves the server during NER analysis
Open Source: Full transparency and auditability (MIT license)
Lightweight: Runs on CPU, no expensive GPU cluster needed
Multilingual: Supports German, English and many other languages
Zero-Shot NER: Detects new entity types without retraining
Fast: Processes documents in milliseconds

GDPR-Compliant AI for Your Business

We implement GLiNER-based privacy layers for your AI applications. Whether chat, document processing or automated correspondence – your data remains protected.

More Glossary Terms

Back to Glossary

GLiNER – GDPR-Compliant AI through Local PII Detection

The Problem

Detected Entities (PII Types)

Two Strategies: Anonymization vs. Pseudonymization

Strategy A: Permanent Anonymization

Suitable for:

Example:

Advantages

Limitations

Strategy B: Pseudonymization with Re-Personalization

Suitable for:

Example:

Advantages

Things to Consider

Technical Implementation: Re-Personalization

Phase 1: PII Detection and Pseudonymization

Phase 2: LLM Processing

Phase 3: Re-Personalization

Security Architecture

Critical Security Rules for Re-Personalization

Data Flow Diagram

When to Use Which Strategy?

Why GLiNER?

GDPR-Compliant AI for Your Business

Related Topics

More Glossary Terms

Book Initial Consultation

Privacy Settings