HFT Node Onboarding
How to bring up a new dedicated node for
Frequency.HFTbots. Runtime audience: SRE + platform team.
Pre-requisites
- Bare-metal or near-bare-metal hardware with:
- Hardware-timestamping NIC (Intel I210/X710, Mellanox ConnectX-5/6/7).
- At least 2 NUMA nodes (most modern dual-socket Xeons / Epycs).
- 2 MiB HugePages support (kernel default).
- SR-IOV-capable NIC.
- Linux kernel >= 5.10 with PTP support (
ptp4l+phc2sysfromlinuxptp). - The node is already a member of the cluster and runs the standard kubelet.
1. Taint + label the node
kubectl taint nodes <node-name> quantbot.io/hft=true:NoSchedule
kubectl label nodes <node-name> quantbot.io/hft=true
2. Apply the kubelet override
The kubelet config drop-in lives at
alphaswarm_platform/deployments/kubernetes/hft-nodes/kubelet-config.yaml.
On systemd hosts:
sudo cp kubelet-config.yaml /etc/kubernetes/kubelet/kubelet.conf.d/quantbot-hft.conf
sudo systemctl restart kubelet
Verify:
kubectl get --raw "/api/v1/nodes/<node-name>/proxy/configz" | jq .kubeletconfig.cpuManagerPolicy
# Expect: "static"
3. Allocate HugePages
kubectl apply -f alphaswarm_platform/deployments/kubernetes/hft-nodes/hugepages-allocation.yaml
# DaemonSet runs once per HFT node and sets nr_hugepages=1024.
4. Bring up PTP
kubectl apply -f alphaswarm_platform/deployments/kubernetes/hft-nodes/ptp-config.yaml
Verify clock discipline (run inside the quantbot-ptp pod):
kubectl exec -n alphaswarm-bots quantbot-ptp-<pod-id> -c phc2sys -- \
pmc -u -b 0 'GET CURRENT_DATA_SET' | grep masterOffset
# Expect masterOffset around 0 (sub-microsecond on a healthy network).
5. Configure SR-IOV
If the SR-IOV Network Operator is installed:
kubectl apply -f alphaswarm_platform/deployments/kubernetes/hft-nodes/sr-iov-config.yaml
Verify VFs are exposed:
kubectl get nodes <node-name> -o json | jq '.status.allocatable | with_entries(select(.key | startswith("openshift.io/quantbot_hft_vf")))'
6. Apply the tuned profile
kubectl apply -f alphaswarm_platform/deployments/kubernetes/hft-nodes/node-tuning-operator.yaml
7. Validate the node passes the operator's HFT check
The QuantBot Operator's validating webhook will refuse to schedule an HFT bot on a node that fails any of:
quantbot.io/hftlabel present- PTP DaemonSet pod running on the node
- HugePages allocation >= the bot's request
- SR-IOV VF available
Run the operator's diagnostics:
alphaswarm-bots validate <hft-bot-slug>
A passing validation prints valid: true and no failure entries.
Rollback
To take the node out of the HFT pool:
kubectl drain <node-name> --ignore-daemonsets=false --delete-emptydir-data
kubectl taint nodes <node-name> quantbot.io/hft=true:NoSchedule-
kubectl label nodes <node-name> quantbot.io/hft-
The HFT DaemonSets (ptp, hugepages, sriov) auto-stop on the node.