GLiNER – GDPR-Compliant AI through Local PII Detection

Open-source NER model for local PII detection. Enables two privacy strategies: permanent anonymization or temporary pseudonymization with re-personalization after AI processing.

Category:Privacy & Security Tools

GLiNER (Generalist and Lightweight Named Entity Recognition) is an open-source model for detecting named entities that runs entirely locally. It forms the foundation for GDPR-compliant AI applications where personally identifiable information (PII) must be protected before being processed by external LLMs.

The Problem

When documents containing personal data are sent to LLMs such as GPT-4 or Claude, that data leaves your own server. This is problematic from a data protection perspective and may violate GDPR. GLiNER solves this problem through local PII detection before the API call.

Detected Entities (PII Types)

  • Person Names: First and last names, titles
  • Addresses: Street, house number, zip code, city, country
  • Contact Details: Phone, email, fax
  • Financial Data: IBAN, BIC, credit card numbers
  • Identifiers: ID card, passport, social security number
  • Health Data: Insurance numbers, diagnoses
  • Company Data: Company names, commercial register numbers
  • Digital IDs: Usernames, IP addresses

Two Strategies: Anonymization vs. Pseudonymization

Depending on the use case, we deploy GLiNER in two different architectures:

Strategy A: Permanent Anonymization

PII is removed and NOT restored

Input → GLiNER → Remove PII → LLM → Anonymous Output

Suitable for:

  • Chatbots and customer service
  • General questions and research
  • Document analysis without personalized response
  • Scenarios where the output does not need to contain names

Example:

Input: "Mr. Smith from Munich has a question about his invoice #12345"

To LLM: "A customer has a question about their invoice"

LLM response: "For invoice inquiries, I recommend the following steps..."

Advantages
  • Maximum security – PII no longer exists
  • No mapping table required
  • Simple architecture
  • No risk of data leaks
Limitations
  • Output cannot be personalized
  • Not suitable for letters/emails

Strategy B: Pseudonymization with Re-Personalization

PII is replaced, processed and restored

Input → GLiNER → Pseudonymize → LLM → Re-Personalize → Personalized Output

Suitable for:

  • Automated email generation
  • Personalized letters and documents
  • Contract templates with customer data
  • Support replies with direct salutation

Example:

Input: "Write a reminder email to John Doe, 123 Main St, 10001 New York. Outstanding amount: $1,250.00"

To LLM: "Write a reminder email to [PERSON_1], [ADDRESS_1]. Outstanding amount: [AMOUNT_1]"

LLM generates: "Dear [PERSON_1], we would like to kindly remind you of the outstanding invoice for [AMOUNT_1]..."

Re-personalized: "Dear John Doe, we would like to kindly remind you of the outstanding invoice for $1,250.00..."

Advantages
  • Personalized outputs possible
  • PII never leaves your own server
  • LLM only sees placeholders
  • Full automation possible
Things to Consider
  • Mapping table must be stored securely
  • Slightly more complex architecture
  • Keep mapping only for session duration

Technical Implementation: Re-Personalization

The pseudonymization workflow consists of three phases:

Phase 1: PII Detection and Pseudonymization

// GLiNER detects all PII in the text
const detectedEntities = gliner.analyze(inputText);

// Result:
[
  { text: "John Doe", type: "PERSON", start: 32, end: 40 },
  { text: "123 Main St", type: "ADDRESS", start: 42, end: 53 },
  { text: "10001 New York", type: "ADDRESS", start: 55, end: 69 },
  { text: "$1,250.00", type: "MONEY", start: 89, end: 98 }
]

// Pseudonymization: replace PII with placeholders
const mapping = new Map();
let pseudonymizedText = inputText;

detectedEntities.forEach((entity, index) => {
  const placeholder = `[${entity.type}_${index + 1}]`;
  mapping.set(placeholder, entity.text);
  pseudonymizedText = pseudonymizedText.replace(entity.text, placeholder);
});

// Mapping (in RAM only, never persisted!):
// [PERSON_1] → "John Doe"
// [ADDRESS_1] → "123 Main St"
// [ADDRESS_2] → "10001 New York"
// [MONEY_1] → "$1,250.00"
      

Phase 2: LLM Processing

// The pseudonymized text is sent to the LLM
const llmResponse = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{
    role: "system",
    content: "You write professional business letters. Placeholders like [PERSON_1] will be replaced later – use them exactly as-is in the text."
  }, {
    role: "user",
    content: pseudonymizedText
  }]
});

// LLM sees ONLY:
// "Write a reminder email to [PERSON_1], [ADDRESS_1], [ADDRESS_2].
//  Outstanding amount: [MONEY_1]"

// LLM responds with placeholders:
// "Dear [PERSON_1], we would like to kindly remind you..."
      

Phase 3: Re-Personalization

// Replace all placeholders with the original PII
let finalOutput = llmResponse.choices[0].message.content;

mapping.forEach((originalValue, placeholder) => {
  finalOutput = finalOutput.replaceAll(placeholder, originalValue);
});

// Delete mapping immediately – never persist!
mapping.clear();

// Result: Fully personalized text
// "Dear John Doe, we would like to kindly remind you..."
      

Security Architecture

Critical Security Rules for Re-Personalization

  1. Mapping in RAM only: The placeholder → PII mapping is NEVER stored in a database or file
  2. Session-bound: The mapping exists only for the duration of the request and is immediately deleted afterwards
  3. No logging: Neither the input, the mapping nor the final output are logged
  4. Encrypted transmission: All communication via HTTPS/TLS
  5. Isolated processing: Each request has its own mapping – no cross-contamination

Data Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                        YOUR SERVER (Secure Zone)                             │
│  ┌──────────┐    ┌─────────┐    ┌─────────────┐    ┌──────────────────────┐ │
│  │  User    │───▶│ GLiNER  │───▶│   Mapping   │    │   Re-Personalization │ │
│  │  Input   │    │  (NER)  │    │ (RAM only!) │    │                      │ │
│  └──────────┘    └────┬────┘    └──────┬──────┘    └──────────▲───────────┘ │
│                       │                │                      │             │
│                       ▼                │                      │             │
│              ┌────────────────┐        │                      │             │
│              │ Pseudonymized  │        │                      │             │
│              │ [PERSON_1]...  │────────┼──────────────────────┘             │
│              └───────┬────────┘        │                                    │
└──────────────────────┼─────────────────┼────────────────────────────────────┘
                       │                 │
                       ▼                 │ (Mapping stays internal!)
        ┌──────────────────────────┐     │
        │      EXTERNAL API        │     │
        │  ┌────────────────────┐  │     │
        │  │   LLM (GPT-4)      │  │     │
        │  │                    │  │     │
        │  │ Sees ONLY:         │  │     │
        │  │ "[PERSON_1] has    │  │     │
        │  │  a question..."    │  │     │
        │  └────────────────────┘  │     │
        └──────────────────────────┘     │
                       │                 │
                       │ Response with   │
                       │ placeholders    │
                       ▼                 │
              ┌────────────────┐         │
              │ "Dear          │─────────┘
              │  [PERSON_1]"   │  Mapping is applied
              └────────────────┘
      

When to Use Which Strategy?

Use Case Recommended Strategy Reason
Chatbot / Customer Service A: Anonymization Responses do not need to be personalized
Document Analysis A: Anonymization Summaries do not require names
Email Generation B: Re-Personalization Emails must address the recipient directly
Letter/Contract Templates B: Re-Personalization Documents require real names and addresses
Internal Knowledge Search A: Anonymization Factual knowledge, no personalization needed
Personalized Reports B: Re-Personalization Reports for specific persons/companies

Why GLiNER?

  • 100% local: No data leaves the server during NER analysis
  • Open Source: Full transparency and auditability (MIT license)
  • Lightweight: Runs on CPU, no expensive GPU cluster needed
  • Multilingual: Supports German, English and many other languages
  • Zero-Shot NER: Detects new entity types without retraining
  • Fast: Processes documents in milliseconds

GDPR-Compliant AI for Your Business

We implement GLiNER-based privacy layers for your AI applications. Whether chat, document processing or automated correspondence – your data remains protected.

More Glossary Terms