Skip to content

Aegis architecture

flowchart LR
  subgraph Client Sites
    C1[Participant 1]
    C2[Participant 2]
    Cn[Participant n]
  end

  API[FastAPI Service\nRBAC + Audit + DP config] -->|Prometheus Exporter| PM[Prometheus]
  FC[Flower Coordinator\nKrum / Trimmed Mean] --> API
  PM --> GF[Grafana]

  C1 -- mTLS --> FC
  C2 -- mTLS --> FC
  Cn -- mTLS --> FC

  API -. OpenAPI .- U[Operators]
  GF -. Dashboards .- U

Components - API: FastAPI service providing dataset/participant registration, training lifecycle, DP config, strategy selection, and compliance reports. Enforces RBAC and emits JSON audit logs. - Federated Coordinator: Flower-based server with robust aggregation (Krum, Trimmed Mean), participant auth, and straggler/retry policy. - Privacy Engine: Opacus-based DP-SGD with RDP accounting, epsilon targeting, and step-wise accounting. - Observability: Prometheus scrapes API metrics; Grafana presents dashboards (traffic, latency, epsilon consumption, health).

Security boundaries - Transport: TLS everywhere; mTLS between coordinator and participants (clients present certs). In K8s, store keys in Secrets and mount as files. - Access control: RBAC roles (admin/operator/viewer) enforced at API; sensitive endpoints gated. - Audit: JSON-structured, tamper-evident chainable logs (timestamp, actor, action, params hash, outcome).

Data flow 1. Operators configure DP/strategy via API. 2. Clients train locally and send updates to coordinator (mTLS). 3. Coordinator aggregates updates and triggers next round. 4. API exposes progress/metrics; Prometheus scrapes; Grafana displays. 5. Compliance report generated on demand with DP config, audit summary, and regulatory mappings.

Trust model and assumptions - Clients hold their own raw data; only model updates leave the site. - DP-SGD noise/clipping limit individual contribution; robust aggregation mitigates some Byzantine behavior. - API authentication/authorization is enforced by RBAC and your upstream IdP/gateway.