This guide applies to self-hosted (on-premise) Gurubase deployments. The PII
filter is delivered as an optional sidecar service that you deploy alongside
the backend.
Overview
The PII Filter Service is a FastAPI sidecar that wraps the
openai/privacy-filter model. When
enabled, the Gurubase backend POSTs each user’s raw question to the sidecar
before any downstream processing. The redacted text is then used for summary
generation, vector search, memory retrieval, and persistence, so the raw
question never reaches embeddings, prompts, or the memory store.
This is complementary to regex-based PII Masking:
- PII Masking applies operator-defined regex patterns to questions, file
attachments, and indexed data sources.
- PII Filter Service runs a machine-learning model that detects PII spans
(names, emails, dates, and similar) in user questions only, without
per-pattern configuration.
The filter is fail-closed: when the toggle is on but the sidecar is
unreachable, misconfigured, or returns an error, the request is rejected with
HTTP 503 rather than processed with unfiltered text.
Build and run the sidecar
The sidecar ships in the Gurubase backend repository under
src/pii-filter-service/. It is a single Dockerfile with two flavors selected
at build time.
# CPU build (~1 GB, python:3.11-slim base)
make build-cpu
# GPU build (~4 GB, nvidia/cuda:12.1 base, requires nvidia-container-toolkit)
make build-gpu
Run the container:
# CPU
make run-cpu PII_FILTER_API_KEY=replace-me
# GPU
make run-gpu PII_FILTER_API_KEY=replace-me
# With a fine-tuned checkpoint mounted from the host
make run-cpu PII_FILTER_API_KEY=replace-me PII_FILTER_MODEL_PATH=/abs/path/to/checkpoint
The first boot loads the model into memory; allow roughly 10 to 30 seconds
before GET /health reports ok: true.
Sidecar configuration
| Env var | Default | Notes |
|---|
PII_FILTER_API_KEY | (empty) | Shared secret enforced via the x-api-key header. Set this in production. |
PII_FILTER_MODEL_PATH | (empty) | Mount a host directory and point here to load a fine-tuned checkpoint. |
PII_FILTER_DEVICE | cpu | cpu, cuda, or gpu (alias for cuda). |
PII_FILTER_DECODE_MODE | viterbi | viterbi or argmax. |
Endpoints
| Method | Path | Purpose |
|---|
GET | /health | Returns ok, current device, decode mode, and model path. |
POST | /filter | Returns the redacted text. Requires the x-api-key header when one is set. |
Sample request:
curl -fsS -X POST http://localhost:8003/filter \
-H "x-api-key: replace-me" \
-H "content-type: application/json" \
-d '{"text":"Alice was born on 1990-01-02, email [email protected]"}'
Sample response:
{"redacted_text": "[PERSON] was born on [DATE], email [EMAIL]", "detected": true, "span_count": 3}
Custom fine-tuned models
The model directory must follow the HuggingFace from_pretrained layout
(config.json, model.safetensors or pytorch_model.bin, tokenizer.json,
tokenizer_config.json, special_tokens_map.json) and must derive from
openai/privacy-filter so the architecture matches. Copy the checkpoint to the
host, then start the container with PII_FILTER_MODEL_PATH set to that path.
GET /health then reports model_path: /models/custom instead of <default>.
Wire the backend to the sidecar
Set the following environment variables on the Gurubase backend so the
backend can reach the sidecar:
| Env var | Default | Notes |
|---|
PII_FILTER_BASE_URL | (empty) | Base URL of the sidecar, for example http://pii-filter:8003. Empty disables the call even when the database toggle is on. |
PII_FILTER_API_KEY | (empty) | Must match the sidecar’s PII_FILTER_API_KEY. Sent in the x-api-key header. |
PII_FILTER_TIMEOUT | 5 | Request timeout in seconds. Timeouts are treated as filter failures and trigger the fail-closed response. |
When the backend runs in Docker on the same host as a sidecar reached over an
SSH tunnel, use http://host.docker.internal:8003 so the container resolves the
forwarded port.
Enable filtering
Two toggles control whether the filter actually runs. Both are reached via the
Django admin on the self-hosted deployment.
- Global default:
Settings.pii_filter_enabled (default false).
- Per-guru override:
GuruType.pii_filter_enabled. When null (the
default), the guru falls back to the global value. Setting it to true or
false overrides the global default for that guru.
When the resolved toggle is on and PII_FILTER_BASE_URL is set, every user
question is passed through POST /filter before the backend runs summary
generation, vector search, memory retrieval, or persistence.
If the resolved toggle is on but PII_FILTER_BASE_URL is empty, or the
sidecar returns an error, times out, or is unreachable, the backend returns
HTTP 503 with the message PII filtering is currently unavailable. Please try again shortly.
Operational notes
-
The filter is applied once per question at the earliest entry point, so
the
/api/v1 path and the web summary view both incur one sidecar
roundtrip rather than two.
-
Empty or whitespace-only inputs are returned unchanged without calling the
sidecar.
-
The sidecar binds to its own localhost by default. Expose it only on a
private network shared with the backend, or use an SSH tunnel during
development:
ssh -N -L 8003:localhost:8003 user@gpu-host
-
For local development, a compose file is provided at
src/pii-filter-service/.dev/docker-compose.yml.
-
macOS hosts cannot use the GPU image; Apple Silicon GPUs are not NVIDIA. Use
the CPU image locally and the GPU image only on Linux with the NVIDIA
drivers and
nvidia-container-toolkit installed.