Guides

Building Privacy-First Architecture with open-source tools

This guide outlines the transition from a standard data-heavy architecture to a privacy-first infrastructure. We focus on data minimization, self-hosting analytics on European hardware, and encrypting sensitive fields at the database level to ensure GDPR compliance and user trust.

5-7 hours5 steps
1

Deploy Self-Hosted Umami on European Infrastructure

Replace third-party trackers with a self-hosted Umami instance. This ensures that user IP addresses and behavior data never leave your controlled infrastructure. Deploying on Hetzner (Germany/Finland) ensures data sovereignty within the EU.

docker-compose.yml
version: '3'
services:
  umami:
    image: ghcr.io/umami-software/umami:postgresql-latest
    env_file: .env
    ports:
      - "3000:3000"
    depends_on:
      - db
  db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: umami
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password

⚠ Common Pitfalls

  • Ensure the DATABASE_URL in .env matches the internal Docker network name of the db service.
  • Failing to set a custom APP_SECRET will result in insecure session management.
2

Implement Data Minimization in Database Schema

Audit your database schema to remove unnecessary PII (Personally Identifiable Information). For required identifiers like email addresses, use one-way cryptographic hashes (SHA-256) if the raw data is only needed for unique identification rather than communication.

schema_migration.sql
-- Replace raw email with a hashed version for unique constraints
ALTER TABLE users ADD COLUMN email_hash TEXT;
UPDATE users SET email_hash = encode(digest(email, 'sha256'), 'hex');
ALTER TABLE users DROP COLUMN email;
ALTER TABLE users ADD CONSTRAINT unique_email_hash UNIQUE (email_hash);

⚠ Common Pitfalls

  • Using MD5 instead of SHA-256 or Argon2 for hashing sensitive identifiers.
  • Forgetting to update the application logic to hash the input before querying the database.
3

Configure a Server-Side Analytics Proxy

To bypass ad-blockers and further anonymize data, proxy your analytics requests through your own backend. This allows you to strip the 'User-Agent' and 'IP Address' headers before the data reaches the analytics engine.

proxy.js
async function handleAnalytics(req, res) {
  const payload = req.body;
  // Remove PII from headers
  const headers = { ...req.headers };
  delete headers['x-forwarded-for'];
  delete headers['user-agent'];

  await fetch('https://your-umami-instance.com/api/send', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(payload)
  });
  res.status(200).send('OK');
}

⚠ Common Pitfalls

  • Accidentally logging the request object in the proxy, which would store the IP in server logs.
  • Increased latency if the proxy server is geographically distant from the user.
4

Enable Transparent Column Encryption

For data that must be stored (like physical addresses), use Application-Level Encryption (ALE). This ensures that even if the database is compromised, the data remains unreadable without the application's master key.

encryption.js
const crypto = require('crypto');
const algorithm = 'aes-256-cbc';
const key = Buffer.from(process.env.ENCRYPTION_KEY, 'hex');

function encrypt(text) {
  const iv = crypto.randomBytes(16);
  const cipher = crypto.createCipheriv(algorithm, key, iv);
  let encrypted = cipher.update(text);
  encrypted = Buffer.concat([encrypted, cipher.final()]);
  return iv.toString('hex') + ':' + encrypted.toString('hex');
}

⚠ Common Pitfalls

  • Storing the encryption key in the same environment or repository as the database.
  • Using a static Initialization Vector (IV), which makes the encryption vulnerable to pattern analysis.
5

Localize AI Features with On-Premise Inference

Instead of sending user-generated content to third-party APIs (OpenAI/Anthropic), run a local inference engine using Ollama. This keeps sensitive user prompts within your infrastructure boundary.

setup_ollama.sh
# Run Ollama as a local container
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# Pull a privacy-respecting model (e.g., Mistral)
docker exec -it ollama ollama run mistral

⚠ Common Pitfalls

  • Underestimating hardware requirements; local LLMs require significant RAM/GPU resources to match API performance.
  • Failing to implement rate limiting on the local AI endpoint, leading to resource exhaustion.

What you built

By following these steps, you have moved from a centralized, high-exposure architecture to a privacy-first model. You now host your own analytics, minimize the data you store, encrypt sensitive fields at the application layer, and process AI tasks locally. This setup significantly reduces your compliance burden and builds long-term user trust.