Skip to content

DroidFarm — Architecture

Kubernetes-native Android device farm: Google Cuttlefish emulators, WebRTC browser access, Appium E2E testing, declarative device templates.

System Overview

flowchart TB
    Browser["**Web Browser**
    ▸ DroidFarm Dashboard
    ▸ Cuttlefish WebRTC stream
    ▸ CI runner / Appium WebDriver"]

    subgraph K8s["Kubernetes Cluster — droidfarm-system"]
        direction TB

        subgraph Edge["Cluster Edge"]
            GW["**Shared Cilium Gateway**
            kube-system/cilium-gateway
            section: https · ALPN h2
            wildcard host: *.STREAMING_DOMAIN"]
            Routes["**HTTPRoute** (one per device)
            name: POOL-device-N
            host: POOL-N.STREAMING_DOMAIN"]
            GW --- Routes
        end

        subgraph Control["Control Plane"]
            Operator["**Operator**
            Go / controller-runtime
            Reconciles: Template · Pool · Session
            + per-device HTTPRoute"]
            Dashboard["**Dashboard**
            Go + HTMX + SSE · :8080"]
        end

        subgraph Infra["Node Infrastructure"]
            DevPlugin["**device-plugin** DaemonSet
            squat/generic-device-plugin
            exposes /dev/kvm, /dev/vhost-*"]
            NodeSetup["**node-setup** DaemonSet
            modprobe vhost_vsock vhost_net vsock"]
        end

        COTURN["**coturn**
        TURN/STUN · :3478"]

        subgraph Pool["DevicePool StatefulSet (one per pool)"]
            subgraph Pod["Emulator Pod"]
                Init(["init: apk-installer"])
                Envoy["**envoy** sidecar
                h2c :8080
                grpc_web filter"]
                CF["**Cuttlefish QEMU**
                gRPC :8554 · WebRTC media"]
                Appium["**Appium Server**
                :4723"]
                Init --> CF
                Envoy --> CF
            end
        end

        Operator -->|"manages"| Pool
        Operator --> Dashboard
        Routes -->|"h2c · port http (8080)"| Envoy
    end

    COTURN -->|"TURN relay UDP 49152–49200"| Browser
    Browser <-->|"HTTPS / HTTP/2"| GW
    Browser <-->|"HTMX + SSE"| Dashboard
    Browser <-->|"WebDriver"| Appium

Emulation: Cuttlefish QEMU (default)

cuttlefish-qemu is the recommended backend. It runs Google Cuttlefish using QEMU software emulation inside the container, with WebRTC built-in and no KVM or node modification required.

Standard Kubernetes deployment — no device plugin, no node labels, no DaemonSets:

  • Capabilities only: NET_ADMIN, NET_RAW, SYS_ADMIN — no --privileged
  • No /dev/kvm or vhost_* devices needed
  • Works on any node including cloud-managed node pools and spot instances

Container image (docker/Dockerfile.cuttlefish):

  • Multi-stage: downloads cvd-host_package.tar.gz from Google CI per Android version
  • Android system disk images are NOT baked in — mounted at runtime via PVC (/data/system_images/)
  • Keeps image manageable; system images are ~3–8 GB per version
  • Entrypoint: launch_cvd --start_webrtc --report_anonymous_usage_stats=n
  • Ports: WebRTC 8443, Cuttlefish operator API 6444
  • Boot time: ~5–10 minutes

Emulation: Cuttlefish with KVM (optional)

Set spec.emulationBackend: cuttlefish in a DeviceTemplate to use hardware-accelerated virtualization for approximately 10x faster boot times (~60 s).

Non-privileged Kubernetes deployment via squat/generic-device-plugin:

  • Exposes /dev/kvm, /dev/vhost-net, /dev/vhost-vsock, /dev/vsock as K8s resource limits
  • Emulator pods request: squat.ai/kvm: "1", squat.ai/vhost: "3"
  • Capabilities: NET_ADMIN, NET_RAW, SYS_ADMIN — no --privileged

Node prerequisites:

modprobe vhost_vsock vhost_net vsock
kubectl label node <name> kvm=true

Automated via charts/droidfarm/templates/node-setup-daemonset.yaml.

See Hardware Acceleration for cloud provider specifics and device plugin verification steps.

Web Streaming: Cuttlefish WebRTC over Cilium Gateway API

Each emulator pod runs an envoy sidecar that fronts the Cuttlefish gRPC API at localhost:8554 and exposes it on port http (8080) as gRPC-Web over HTTP/2 (h2c). The dashboard surfaces the per-device HTTPS URL (status.streamURL) in an iframe — the browser loads the Cuttlefish web UI from the envoy sidecar and then negotiates the WebRTC media flow directly with the emulator.

Cluster ingress path (cluster's shared Cilium Gateway, per-device routes):

  • The per-device HTTPRoute objects attach to the cluster's shared Cilium Gateway — kube-system/cilium-gateway, listener section https — which is the Cilium 1.18+ cluster-wide Gateway pattern. The DroidFarm chart does not render a Gateway by default.
  • That listener carries a wildcard hostname *.<streaming.domain> (e.g. *.local.geekxflood.io) with TLS terminated at the Gateway from a cert-manager Certificate that the cluster operator manages alongside the shared Gateway.
  • One HTTPRoute per device, name <pool>-device-<ordinal>, generated by the operator.
  • Each route matches a specific hostname <pool>-<ordinal>.<streaming.domain> and forwards all paths to the per-device Service on its http port (8080).
  • The per-device Service's http port carries appProtocol: kubernetes.io/h2c, so Cilium speaks native HTTP/2 to the envoy sidecar — this is what lets the gRPC-Web trailer frame (0x80-prefixed grpc-status:0) reach the browser intact.

A DevicePool overrides the target Gateway only when running a dedicated streaming Gateway — see streaming.gateway chart values.

Required Cilium configuration in kube-system/cilium-config:

Key Value Why
enable-gateway-api-alpn true TLS listener advertises ALPN h2; without it browsers fall back to HTTP/1.1
enable-gateway-api-app-protocol true Backend appProtocol: kubernetes.io/h2c is honoured — Cilium speaks HTTP/2 to the pod

If ALPN h2 is not advertised, the browser negotiates HTTP/1.1 to the gateway and Cilium drops the gRPC-Web trailer frame on its way back, which manifests as a stalled stream and a missing grpc-status header in the dev tools network panel.

Browser → emulator end-to-end:

Browser  ──HTTPS / HTTP/2 (ALPN h2)──▶  Shared Cilium Gateway (TLS terminate)
                                          kube-system/cilium-gateway · https
                                         HTTP/2 h2c (appProtocol)
                                          envoy sidecar :8080
                                          (envoy.filters.http.grpc_web)
                                          Cuttlefish gRPC :8554

The envoy sidecar keeps its own envoy.filters.http.grpc_web filter. Cilium's grpc_web filter on the cluster path is the only conversion point that matters for the browser, and the two coexist cleanly because Cilium uses HTTP/2 to the backend rather than HTTP/1.1.

TLS — the shared Cilium Gateway's https listener reads its certificate from a cert-manager Certificate resource managed by the cluster operator (typically the same cluster issuer the dashboard uses — Cloudflare DNS-01 on the GXF cluster). The certificate covers the wildcard *.<streaming.domain>, so a new device coming online never needs a new certificate.

NAT traversal via coturn:

  • coturn deployed as a Helm component (optional but required for cross-NAT access)
  • TURN credentials stored in a Kubernetes Secret, never in values.yaml
  • Cuttlefish receives ICE server config via CF_WEBRTC_TURN_URL env var

No noVNC, no additional streaming sidecar beyond envoy. Cuttlefish handles media natively.

CRD Data Model

DeviceTemplate — Device Profile

DeviceTemplate
├── androidVersion           "13" | "14" | "15"
├── avdProfile               pixel_6 | pixel_7 | …
├── resources                cpu, memory, storage per instance
├── appConfig[]              Ordered — first entry = primary AUT
│   ├── packageName          com.example.myapp
│   ├── apkSource            url | configMapRef | secretRef | preinstalled
│   │                        url must match ^https?://.+ (XSS/SSRF guard)
│   ├── managedConfig[]      Android Enterprise Managed App Config
│   │   └── key/value/type   Delivered via ADB broadcast on device boot
│   ├── permissions[]        Granted via `adb shell pm grant`
│   ├── autoLaunch           Start app after boot
│   └── clearDataBetweenSessions  `pm clear` before each TestSession
├── systemConfig             locale, timezone, extraProperties (adb setprop)
└── testConfig               default Appium capabilities, resetStrategy

managedConfig mirrors Android Enterprise Managed App Config delivered via android.intent.action.APPLICATION_RESTRICTIONS_CHANGED ADB broadcast.

DevicePool — Fleet Definition

DevicePool
├── templateRef              → DeviceTemplate
├── replicas { min, max }    StatefulSet scaling bounds (+ KEDA on queue depth)
├── sessionPolicy            timeout, idleTimeout, recycleAfterSessions, recycleAfter
├── nodeSelector             Optional; add kvm=true only when using emulationBackend: cuttlefish
├── streaming                enabled, domain, gatewayName (cilium-gateway),
│                            gatewayNamespace (kube-system), gatewaySectionName (https)
└── appium                   enabled, version, port

TestSession — Single E2E Run

TestSession
├── poolRef                  → DevicePool
├── timeout                  Max session duration
├── appium                   configMapRef (Appium caps JSON), extraCapabilities
├── artifacts                recording, s3 / pvc destination
└── status
    ├── phase                Pending→Claiming→Preparing→Running→Collecting→Succeeded|Failed|TimedOut
    ├── deviceRef            Assigned emulator pod name
    ├── appiumEndpoint       http://device-0.pool-emulator.ns.svc:4723
    └── streamURL            https://<pool>-<ordinal>.<streaming.domain> (gRPC-Web + WebRTC)

Session Lifecycle

flowchart TD
    Created([TestSession created]) --> Pending(Pending)
    Pending -->|"Operator finds DevicePool"| Claiming(Claiming)
    Claiming -->|"Idle device found
    appiumEndpoint + streamURL set"| Preparing(Preparing)
    Claiming -->|"no idle device"| Claiming
    Preparing -->|"APK install · managed config
    permission grant complete"| Running(Running)
    Preparing -->|timeout| TimedOut(TimedOut)
    Running -->|"external runner patches
    status.result"| Collecting(Collecting)
    Running -->|timeout| TimedOut

    Collecting -->|"duration recorded\n    artifactURL set"| Terminal


    subgraph Terminal["Terminal"]
        Succeeded([Succeeded])
        Failed([Failed])
    end

    TimedOut --> Release["Device → Idle
    SessionsServed++"]
    Terminal --> Release

Scalability

KEDA autoscaling (optional, requires KEDA operator):

  • ScaledObject monitors pending TestSession count
  • Scales DevicePool StatefulSet replicas within spec.replicas.{min,max}
  • cooldownPeriod: 300 prevents thrashing after test bursts

Resource governance (enforced by LimitRange + ResourceQuota):

  • LimitRange: minimum 1 CPU / 2Gi RAM per emulator container — rejects under-declared pods
  • ResourceQuota: hard cap on total namespace consumption (optional, disabled by default)

Boot time mitigation:

  • replicas.min ≥ 1 keeps at least one warm device ready
  • Cuttlefish snapshot resume: sub-10s boot from saved checkpoint (planned)
  • recycleAfterSessions replaces pods in the background, not during sessions

Pod anti-affinity spreads emulators across nodes — one device failure doesn't take out the whole pool. When using the KVM backend, a nodeSelector of kvm: "true" ensures emulator pods land only on KVM-capable nodes.

Security Model

Layer Control
RBAC ClusterRole for CRDs only; namespaced Role for everything else
Pod privileges No --privileged; device plugin + specific capabilities
Credentials TURN credentials in Secret, never in values.yaml or env args
Edge auth Per-device HTTPRoute is the auth boundary; one hostname per device, scoped to the shared Cilium Gateway
Network NetworkPolicy: default-deny + targeted allow (8080 envoy / 8443 WebRTC / 5554 ADB / 4723 Appium / DNS)
CRD validation packageName regex, apkSource.url https-only, CEL shell-metachar block
Image builds SBOM + provenance via docker/build-push-action, pinned base images

See Security for the full security model.

Component Inventory

Component Image Built by Notes
Operator ghcr.io/christopherime/droidfarm-operator build-operator.yaml
Dashboard ghcr.io/christopherime/droidfarm-dashboard build-dashboard.yaml
Cuttlefish QEMU ghcr.io/christopherime/cuttlefish:{android_version} build-cuttlefish.yaml Default backend, no KVM required
Init container ghcr.io/christopherime/droidfarm-init build-init.yaml
coturn coturn/coturn:4.6 Public image
Device plugin squat/generic-device-plugin:latest Public image Optional — only for KVM backend
Appium appium/appium:2.5.0 Public image