DroidFarm — Architecture¶
Kubernetes-native Android device farm: Google Cuttlefish emulators, WebRTC browser access, Appium E2E testing, declarative device templates.
System Overview¶
flowchart TB
Browser["**Web Browser**
▸ DroidFarm Dashboard
▸ Cuttlefish WebRTC stream
▸ CI runner / Appium WebDriver"]
subgraph K8s["Kubernetes Cluster — droidfarm-system"]
direction TB
subgraph Edge["Cluster Edge"]
GW["**Shared Cilium Gateway**
kube-system/cilium-gateway
section: https · ALPN h2
wildcard host: *.STREAMING_DOMAIN"]
Routes["**HTTPRoute** (one per device)
name: POOL-device-N
host: POOL-N.STREAMING_DOMAIN"]
GW --- Routes
end
subgraph Control["Control Plane"]
Operator["**Operator**
Go / controller-runtime
Reconciles: Template · Pool · Session
+ per-device HTTPRoute"]
Dashboard["**Dashboard**
Go + HTMX + SSE · :8080"]
end
subgraph Infra["Node Infrastructure"]
DevPlugin["**device-plugin** DaemonSet
squat/generic-device-plugin
exposes /dev/kvm, /dev/vhost-*"]
NodeSetup["**node-setup** DaemonSet
modprobe vhost_vsock vhost_net vsock"]
end
COTURN["**coturn**
TURN/STUN · :3478"]
subgraph Pool["DevicePool StatefulSet (one per pool)"]
subgraph Pod["Emulator Pod"]
Init(["init: apk-installer"])
Envoy["**envoy** sidecar
h2c :8080
grpc_web filter"]
CF["**Cuttlefish QEMU**
gRPC :8554 · WebRTC media"]
Appium["**Appium Server**
:4723"]
Init --> CF
Envoy --> CF
end
end
Operator -->|"manages"| Pool
Operator --> Dashboard
Routes -->|"h2c · port http (8080)"| Envoy
end
COTURN -->|"TURN relay UDP 49152–49200"| Browser
Browser <-->|"HTTPS / HTTP/2"| GW
Browser <-->|"HTMX + SSE"| Dashboard
Browser <-->|"WebDriver"| Appium Emulation: Cuttlefish QEMU (default)¶
cuttlefish-qemu is the recommended backend. It runs Google Cuttlefish using QEMU software emulation inside the container, with WebRTC built-in and no KVM or node modification required.
Standard Kubernetes deployment — no device plugin, no node labels, no DaemonSets:
- Capabilities only:
NET_ADMIN,NET_RAW,SYS_ADMIN— no--privileged - No
/dev/kvmorvhost_*devices needed - Works on any node including cloud-managed node pools and spot instances
Container image (docker/Dockerfile.cuttlefish):
- Multi-stage: downloads
cvd-host_package.tar.gzfrom Google CI per Android version - Android system disk images are NOT baked in — mounted at runtime via PVC (
/data/system_images/) - Keeps image manageable; system images are ~3–8 GB per version
- Entrypoint:
launch_cvd --start_webrtc --report_anonymous_usage_stats=n - Ports: WebRTC 8443, Cuttlefish operator API 6444
- Boot time: ~5–10 minutes
Emulation: Cuttlefish with KVM (optional)¶
Set spec.emulationBackend: cuttlefish in a DeviceTemplate to use hardware-accelerated virtualization for approximately 10x faster boot times (~60 s).
Non-privileged Kubernetes deployment via squat/generic-device-plugin:
- Exposes
/dev/kvm,/dev/vhost-net,/dev/vhost-vsock,/dev/vsockas K8s resource limits - Emulator pods request:
squat.ai/kvm: "1",squat.ai/vhost: "3" - Capabilities:
NET_ADMIN,NET_RAW,SYS_ADMIN— no--privileged
Node prerequisites:
Automated via charts/droidfarm/templates/node-setup-daemonset.yaml.
See Hardware Acceleration for cloud provider specifics and device plugin verification steps.
Web Streaming: Cuttlefish WebRTC over Cilium Gateway API¶
Each emulator pod runs an envoy sidecar that fronts the Cuttlefish gRPC API at localhost:8554 and exposes it on port http (8080) as gRPC-Web over HTTP/2 (h2c). The dashboard surfaces the per-device HTTPS URL (status.streamURL) in an iframe — the browser loads the Cuttlefish web UI from the envoy sidecar and then negotiates the WebRTC media flow directly with the emulator.
Cluster ingress path (cluster's shared Cilium Gateway, per-device routes):
- The per-device
HTTPRouteobjects attach to the cluster's shared Cilium Gateway —kube-system/cilium-gateway, listener sectionhttps— which is the Cilium 1.18+ cluster-wide Gateway pattern. The DroidFarm chart does not render a Gateway by default. - That listener carries a wildcard hostname
*.<streaming.domain>(e.g.*.local.geekxflood.io) with TLS terminated at the Gateway from a cert-managerCertificatethat the cluster operator manages alongside the shared Gateway. - One
HTTPRouteper device, name<pool>-device-<ordinal>, generated by the operator. - Each route matches a specific hostname
<pool>-<ordinal>.<streaming.domain>and forwards all paths to the per-deviceServiceon itshttpport (8080). - The per-device Service's
httpport carriesappProtocol: kubernetes.io/h2c, so Cilium speaks native HTTP/2 to the envoy sidecar — this is what lets the gRPC-Web trailer frame (0x80-prefixedgrpc-status:0) reach the browser intact.
A DevicePool overrides the target Gateway only when running a dedicated streaming Gateway — see streaming.gateway chart values.
Required Cilium configuration in kube-system/cilium-config:
| Key | Value | Why |
|---|---|---|
enable-gateway-api-alpn | true | TLS listener advertises ALPN h2; without it browsers fall back to HTTP/1.1 |
enable-gateway-api-app-protocol | true | Backend appProtocol: kubernetes.io/h2c is honoured — Cilium speaks HTTP/2 to the pod |
If ALPN h2 is not advertised, the browser negotiates HTTP/1.1 to the gateway and Cilium drops the gRPC-Web trailer frame on its way back, which manifests as a stalled stream and a missing grpc-status header in the dev tools network panel.
Browser → emulator end-to-end:
Browser ──HTTPS / HTTP/2 (ALPN h2)──▶ Shared Cilium Gateway (TLS terminate)
kube-system/cilium-gateway · https
│
HTTP/2 h2c (appProtocol)
▼
envoy sidecar :8080
(envoy.filters.http.grpc_web)
│
▼
Cuttlefish gRPC :8554
The envoy sidecar keeps its own envoy.filters.http.grpc_web filter. Cilium's grpc_web filter on the cluster path is the only conversion point that matters for the browser, and the two coexist cleanly because Cilium uses HTTP/2 to the backend rather than HTTP/1.1.
TLS — the shared Cilium Gateway's https listener reads its certificate from a cert-manager Certificate resource managed by the cluster operator (typically the same cluster issuer the dashboard uses — Cloudflare DNS-01 on the GXF cluster). The certificate covers the wildcard *.<streaming.domain>, so a new device coming online never needs a new certificate.
NAT traversal via coturn:
coturndeployed as a Helm component (optional but required for cross-NAT access)- TURN credentials stored in a Kubernetes Secret, never in values.yaml
- Cuttlefish receives ICE server config via
CF_WEBRTC_TURN_URLenv var
No noVNC, no additional streaming sidecar beyond envoy. Cuttlefish handles media natively.
CRD Data Model¶
DeviceTemplate — Device Profile¶
DeviceTemplate
├── androidVersion "13" | "14" | "15"
├── avdProfile pixel_6 | pixel_7 | …
├── resources cpu, memory, storage per instance
├── appConfig[] Ordered — first entry = primary AUT
│ ├── packageName com.example.myapp
│ ├── apkSource url | configMapRef | secretRef | preinstalled
│ │ url must match ^https?://.+ (XSS/SSRF guard)
│ ├── managedConfig[] Android Enterprise Managed App Config
│ │ └── key/value/type Delivered via ADB broadcast on device boot
│ ├── permissions[] Granted via `adb shell pm grant`
│ ├── autoLaunch Start app after boot
│ └── clearDataBetweenSessions `pm clear` before each TestSession
├── systemConfig locale, timezone, extraProperties (adb setprop)
└── testConfig default Appium capabilities, resetStrategy
managedConfig mirrors Android Enterprise Managed App Config delivered via android.intent.action.APPLICATION_RESTRICTIONS_CHANGED ADB broadcast.
DevicePool — Fleet Definition¶
DevicePool
├── templateRef → DeviceTemplate
├── replicas { min, max } StatefulSet scaling bounds (+ KEDA on queue depth)
├── sessionPolicy timeout, idleTimeout, recycleAfterSessions, recycleAfter
├── nodeSelector Optional; add kvm=true only when using emulationBackend: cuttlefish
├── streaming enabled, domain, gatewayName (cilium-gateway),
│ gatewayNamespace (kube-system), gatewaySectionName (https)
└── appium enabled, version, port
TestSession — Single E2E Run¶
TestSession
├── poolRef → DevicePool
├── timeout Max session duration
├── appium configMapRef (Appium caps JSON), extraCapabilities
├── artifacts recording, s3 / pvc destination
└── status
├── phase Pending→Claiming→Preparing→Running→Collecting→Succeeded|Failed|TimedOut
├── deviceRef Assigned emulator pod name
├── appiumEndpoint http://device-0.pool-emulator.ns.svc:4723
└── streamURL https://<pool>-<ordinal>.<streaming.domain> (gRPC-Web + WebRTC)
Session Lifecycle¶
flowchart TD
Created([TestSession created]) --> Pending(Pending)
Pending -->|"Operator finds DevicePool"| Claiming(Claiming)
Claiming -->|"Idle device found
appiumEndpoint + streamURL set"| Preparing(Preparing)
Claiming -->|"no idle device"| Claiming
Preparing -->|"APK install · managed config
permission grant complete"| Running(Running)
Preparing -->|timeout| TimedOut(TimedOut)
Running -->|"external runner patches
status.result"| Collecting(Collecting)
Running -->|timeout| TimedOut
Collecting -->|"duration recorded\n artifactURL set"| Terminal
subgraph Terminal["Terminal"]
Succeeded([Succeeded])
Failed([Failed])
end
TimedOut --> Release["Device → Idle
SessionsServed++"]
Terminal --> Release Scalability¶
KEDA autoscaling (optional, requires KEDA operator):
- ScaledObject monitors pending TestSession count
- Scales DevicePool StatefulSet replicas within
spec.replicas.{min,max} cooldownPeriod: 300prevents thrashing after test bursts
Resource governance (enforced by LimitRange + ResourceQuota):
LimitRange: minimum 1 CPU / 2Gi RAM per emulator container — rejects under-declared podsResourceQuota: hard cap on total namespace consumption (optional, disabled by default)
Boot time mitigation:
replicas.min ≥ 1keeps at least one warm device ready- Cuttlefish snapshot resume: sub-10s boot from saved checkpoint (planned)
recycleAfterSessionsreplaces pods in the background, not during sessions
Pod anti-affinity spreads emulators across nodes — one device failure doesn't take out the whole pool. When using the KVM backend, a nodeSelector of kvm: "true" ensures emulator pods land only on KVM-capable nodes.
Security Model¶
| Layer | Control |
|---|---|
| RBAC | ClusterRole for CRDs only; namespaced Role for everything else |
| Pod privileges | No --privileged; device plugin + specific capabilities |
| Credentials | TURN credentials in Secret, never in values.yaml or env args |
| Edge auth | Per-device HTTPRoute is the auth boundary; one hostname per device, scoped to the shared Cilium Gateway |
| Network | NetworkPolicy: default-deny + targeted allow (8080 envoy / 8443 WebRTC / 5554 ADB / 4723 Appium / DNS) |
| CRD validation | packageName regex, apkSource.url https-only, CEL shell-metachar block |
| Image builds | SBOM + provenance via docker/build-push-action, pinned base images |
See Security for the full security model.
Component Inventory¶
| Component | Image | Built by | Notes |
|---|---|---|---|
| Operator | ghcr.io/christopherime/droidfarm-operator | build-operator.yaml | |
| Dashboard | ghcr.io/christopherime/droidfarm-dashboard | build-dashboard.yaml | |
| Cuttlefish QEMU | ghcr.io/christopherime/cuttlefish:{android_version} | build-cuttlefish.yaml | Default backend, no KVM required |
| Init container | ghcr.io/christopherime/droidfarm-init | build-init.yaml | |
| coturn | coturn/coturn:4.6 | Public image | |
| Device plugin | squat/generic-device-plugin:latest | Public image | Optional — only for KVM backend |
| Appium | appium/appium:2.5.0 | Public image |