Skip to main content
This guide applies to self-hosted (on-premise) Gurubase deployments. The PII filter is delivered as an optional sidecar service that you deploy alongside the backend.

Overview

The PII Filter Service is a FastAPI sidecar that wraps the openai/privacy-filter model. When enabled, the Gurubase backend POSTs each user’s raw question to the sidecar before any downstream processing. The redacted text is then used for summary generation, vector search, memory retrieval, and persistence, so the raw question never reaches embeddings, prompts, or the memory store. This is complementary to regex-based PII Masking:
  • PII Masking applies operator-defined regex patterns to questions, file attachments, and indexed data sources.
  • PII Filter Service runs a machine-learning model that detects PII spans (names, emails, dates, and similar) in user questions only, without per-pattern configuration.
The filter is fail-closed: when the toggle is on but the sidecar is unreachable, misconfigured, or returns an error, the request is rejected with HTTP 503 rather than processed with unfiltered text.

Build and run the sidecar

The sidecar ships in the Gurubase backend repository under src/pii-filter-service/. It is a single Dockerfile with two flavors selected at build time.
# CPU build (~1 GB, python:3.11-slim base)
make build-cpu

# GPU build (~4 GB, nvidia/cuda:12.1 base, requires nvidia-container-toolkit)
make build-gpu
Run the container:
# CPU
make run-cpu PII_FILTER_API_KEY=replace-me

# GPU
make run-gpu PII_FILTER_API_KEY=replace-me

# With a fine-tuned checkpoint mounted from the host
make run-cpu PII_FILTER_API_KEY=replace-me PII_FILTER_MODEL_PATH=/abs/path/to/checkpoint
The first boot loads the model into memory; allow roughly 10 to 30 seconds before GET /health reports ok: true.

Sidecar configuration

Env varDefaultNotes
PII_FILTER_API_KEY(empty)Shared secret enforced via the x-api-key header. Set this in production.
PII_FILTER_MODEL_PATH(empty)Mount a host directory and point here to load a fine-tuned checkpoint.
PII_FILTER_DEVICEcpucpu, cuda, or gpu (alias for cuda).
PII_FILTER_DECODE_MODEviterbiviterbi or argmax.

Endpoints

MethodPathPurpose
GET/healthReturns ok, current device, decode mode, and model path.
POST/filterReturns the redacted text. Requires the x-api-key header when one is set.
Sample request:
curl -fsS -X POST http://localhost:8003/filter \
  -H "x-api-key: replace-me" \
  -H "content-type: application/json" \
  -d '{"text":"Alice was born on 1990-01-02, email [email protected]"}'
Sample response:
{"redacted_text": "[PERSON] was born on [DATE], email [EMAIL]", "detected": true, "span_count": 3}

Custom fine-tuned models

The model directory must follow the HuggingFace from_pretrained layout (config.json, model.safetensors or pytorch_model.bin, tokenizer.json, tokenizer_config.json, special_tokens_map.json) and must derive from openai/privacy-filter so the architecture matches. Copy the checkpoint to the host, then start the container with PII_FILTER_MODEL_PATH set to that path. GET /health then reports model_path: /models/custom instead of <default>.

Wire the backend to the sidecar

Set the following environment variables on the Gurubase backend so the backend can reach the sidecar:
Env varDefaultNotes
PII_FILTER_BASE_URL(empty)Base URL of the sidecar, for example http://pii-filter:8003. Empty disables the call even when the database toggle is on.
PII_FILTER_API_KEY(empty)Must match the sidecar’s PII_FILTER_API_KEY. Sent in the x-api-key header.
PII_FILTER_TIMEOUT5Request timeout in seconds. Timeouts are treated as filter failures and trigger the fail-closed response.
When the backend runs in Docker on the same host as a sidecar reached over an SSH tunnel, use http://host.docker.internal:8003 so the container resolves the forwarded port.

Enable filtering

Two toggles control whether the filter actually runs. Both are reached via the Django admin on the self-hosted deployment.
  • Global default: Settings.pii_filter_enabled (default false).
  • Per-guru override: GuruType.pii_filter_enabled. When null (the default), the guru falls back to the global value. Setting it to true or false overrides the global default for that guru.
When the resolved toggle is on and PII_FILTER_BASE_URL is set, every user question is passed through POST /filter before the backend runs summary generation, vector search, memory retrieval, or persistence. If the resolved toggle is on but PII_FILTER_BASE_URL is empty, or the sidecar returns an error, times out, or is unreachable, the backend returns HTTP 503 with the message PII filtering is currently unavailable. Please try again shortly.

Operational notes

  • The filter is applied once per question at the earliest entry point, so the /api/v1 path and the web summary view both incur one sidecar roundtrip rather than two.
  • Empty or whitespace-only inputs are returned unchanged without calling the sidecar.
  • The sidecar binds to its own localhost by default. Expose it only on a private network shared with the backend, or use an SSH tunnel during development:
    ssh -N -L 8003:localhost:8003 user@gpu-host
    
  • For local development, a compose file is provided at src/pii-filter-service/.dev/docker-compose.yml.
  • macOS hosts cannot use the GPU image; Apple Silicon GPUs are not NVIDIA. Use the CPU image locally and the GPU image only on Linux with the NVIDIA drivers and nvidia-container-toolkit installed.