> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gurubase.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# PII Filter Service

> Redact PII from user questions with an ML-based sidecar before they reach summary generation, retrieval, or memory.

<Warning>
  This guide applies to self-hosted (on-premise) Gurubase deployments. The PII
  filter is delivered as an optional sidecar service that you deploy alongside
  the backend.
</Warning>

## Overview

The PII Filter Service is a FastAPI sidecar that wraps the
[openai/privacy-filter](https://github.com/openai/privacy-filter) model. When
enabled, the Gurubase backend POSTs each user's raw question to the sidecar
before any downstream processing. The redacted text is then used for summary
generation, vector search, memory retrieval, and persistence, so the raw
question never reaches embeddings, prompts, or the memory store.

This is complementary to regex-based [PII Masking](/guides/pii-masking):

* **PII Masking** applies operator-defined regex patterns to questions, file
  attachments, and indexed data sources.
* **PII Filter Service** runs a machine-learning model that detects PII spans
  (names, emails, dates, and similar) in user questions only, without
  per-pattern configuration.

The filter is **fail-closed**: when the toggle is on but the sidecar is
unreachable, misconfigured, or returns an error, the request is rejected with
HTTP `503` rather than processed with unfiltered text.

## Build and run the sidecar

The sidecar ships in the Gurubase backend repository under
`src/pii-filter-service/`. It is a single Dockerfile with two flavors selected
at build time.

```bash theme={null}
# CPU build (~1 GB, python:3.11-slim base)
make build-cpu

# GPU build (~4 GB, nvidia/cuda:12.1 base, requires nvidia-container-toolkit)
make build-gpu
```

Run the container:

```bash theme={null}
# CPU
make run-cpu PII_FILTER_API_KEY=replace-me

# GPU
make run-gpu PII_FILTER_API_KEY=replace-me

# With a fine-tuned checkpoint mounted from the host
make run-cpu PII_FILTER_API_KEY=replace-me PII_FILTER_MODEL_PATH=/abs/path/to/checkpoint
```

The first boot loads the model into memory; allow roughly 10 to 30 seconds
before `GET /health` reports `ok: true`.

### Sidecar configuration

| Env var                  | Default   | Notes                                                                      |
| ------------------------ | --------- | -------------------------------------------------------------------------- |
| `PII_FILTER_API_KEY`     | (empty)   | Shared secret enforced via the `x-api-key` header. Set this in production. |
| `PII_FILTER_MODEL_PATH`  | (empty)   | Mount a host directory and point here to load a fine-tuned checkpoint.     |
| `PII_FILTER_DEVICE`      | `cpu`     | `cpu`, `cuda`, or `gpu` (alias for `cuda`).                                |
| `PII_FILTER_DECODE_MODE` | `viterbi` | `viterbi` or `argmax`.                                                     |

### Endpoints

| Method | Path      | Purpose                                                                     |
| ------ | --------- | --------------------------------------------------------------------------- |
| `GET`  | `/health` | Returns `ok`, current device, decode mode, and model path.                  |
| `POST` | `/filter` | Returns the redacted text. Requires the `x-api-key` header when one is set. |

Sample request:

```bash theme={null}
curl -fsS -X POST http://localhost:8003/filter \
  -H "x-api-key: replace-me" \
  -H "content-type: application/json" \
  -d '{"text":"Alice was born on 1990-01-02, email alice@example.com"}'
```

Sample response:

```json theme={null}
{"redacted_text": "[PERSON] was born on [DATE], email [EMAIL]", "detected": true, "span_count": 3}
```

### Custom fine-tuned models

The model directory must follow the HuggingFace `from_pretrained` layout
(`config.json`, `model.safetensors` or `pytorch_model.bin`, `tokenizer.json`,
`tokenizer_config.json`, `special_tokens_map.json`) and must derive from
`openai/privacy-filter` so the architecture matches. Copy the checkpoint to the
host, then start the container with `PII_FILTER_MODEL_PATH` set to that path.
`GET /health` then reports `model_path: /models/custom` instead of `<default>`.

## Wire the backend to the sidecar

Set the following environment variables on the Gurubase backend so the
backend can reach the sidecar:

| Env var               | Default | Notes                                                                                                                       |
| --------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------- |
| `PII_FILTER_BASE_URL` | (empty) | Base URL of the sidecar, for example `http://pii-filter:8003`. Empty disables the call even when the database toggle is on. |
| `PII_FILTER_API_KEY`  | (empty) | Must match the sidecar's `PII_FILTER_API_KEY`. Sent in the `x-api-key` header.                                              |
| `PII_FILTER_TIMEOUT`  | `5`     | Request timeout in seconds. Timeouts are treated as filter failures and trigger the fail-closed response.                   |

When the backend runs in Docker on the same host as a sidecar reached over an
SSH tunnel, use `http://host.docker.internal:8003` so the container resolves the
forwarded port.

## Enable filtering

Two toggles control whether the filter actually runs. Both are reached via the
Django admin on the self-hosted deployment.

* **Global default**: `Settings.pii_filter_enabled` (default `false`).
* **Per-guru override**: `GuruType.pii_filter_enabled`. When `null` (the
  default), the guru falls back to the global value. Setting it to `true` or
  `false` overrides the global default for that guru.

When the resolved toggle is on and `PII_FILTER_BASE_URL` is set, every user
question is passed through `POST /filter` before the backend runs summary
generation, vector search, memory retrieval, or persistence.

If the resolved toggle is on but `PII_FILTER_BASE_URL` is empty, or the
sidecar returns an error, times out, or is unreachable, the backend returns
HTTP `503` with the message `PII filtering is currently unavailable. Please
try again shortly.`

## Operational notes

* The filter is applied **once** per question at the earliest entry point, so
  the `/api/v1` path and the web `summary` view both incur one sidecar
  roundtrip rather than two.

* Empty or whitespace-only inputs are returned unchanged without calling the
  sidecar.

* The sidecar binds to its own localhost by default. Expose it only on a
  private network shared with the backend, or use an SSH tunnel during
  development:

  ```bash theme={null}
  ssh -N -L 8003:localhost:8003 user@gpu-host
  ```

* For local development, a compose file is provided at
  `src/pii-filter-service/.dev/docker-compose.yml`.

* macOS hosts cannot use the GPU image; Apple Silicon GPUs are not NVIDIA. Use
  the CPU image locally and the GPU image only on Linux with the NVIDIA
  drivers and `nvidia-container-toolkit` installed.