WBLV Private Cloud Lab
Overview
A private cloud running on repurposed enterprise hardware, using the same software stacks and security controls I work with professionally.
The problem with only learning security and infrastructure on client environments is that you can't experiment. You can't upgrade a cluster node to see what breaks, run a new detection rule that might flood alerts, or test a firewall migration approach before committing to it in production.
The lab closes that gap. It runs the Proxmox hypervisor stack, OPNsense firewalls, and CIS hardening baselines I use at work. If I configure it here first, I understand it before it matters.
Servers
The cluster runs on three identical nodes — repurposed enterprise servers sourced through a recycling contact. Each node runs Proxmox VE with Ceph for converged storage, giving a fully hyper-converged architecture with no external SAN requirement.
| Motherboard | Supermicro X10DRI — dual socket LGA2011-v3 |
| CPU | 2× Intel Xeon E5-2680 v4 (14C/28T each, 56 threads total per node) |
| RAM | 64GB DDR4 ECC per node (192GB cluster total) |
| Network | 2× dual-port 10GbE RJ45 NIC + HPE 4-port 1GbE |
| Storage HBA | LSI 9300-16i — 16-port SAS3/SATA3 |
| Ceph SSDs | 3× Samsung PM1633a 1.6TB SAS SSD per node (9 total) |
| Ceph HDDs | Shared pool of 4TB and 10TB SAS HDDs across cluster |
| Boot | Dedicated SSD per node for Proxmox OS |
| Case | LC-4480 4U rackmount |
Network Interfaces
- Bond0 (LACP) — 2× 10GbE for VM traffic and Ceph replication
- mgmt0 — 1GbE dedicated management interface
- IPMI — out-of-band management, separate management VLAN
Cluster Configuration
All three nodes form a Proxmox VE cluster with quorum provided by the three-node Corosync ring. VMs can live-migrate between nodes without downtime. The cluster uses LACP-bonded 10GbE links for both VM traffic and Ceph replication, giving redundant high-bandwidth paths between every node.
Each node runs both the Proxmox hypervisor and Ceph OSD daemons. It's a hyper-converged setup (same idea as Nutanix or vSAN), which costs some compute overhead but removes the need for a separate storage layer.
Storage Architecture
Ceph runs converged on all three nodes, so storage OSDs share the same hosts as compute. You lose some CPU to Ceph overhead, but there's no separate storage infrastructure to manage.
Pools & Replication
vm-pool-ssd ~5.5TB usable Samsung PM1633a SSDs Replication factor: 3 → VM workloads, Sentinel lab, high-IOPS services
vm-pool-hdd ~24TB usable SAS HDDs Replication factor: 3 → Backup storage, archives, bulk data, cold workloads
3 OSD nodes × 6 SSDs = 18 OSD daemons (SSD pool) 3 OSD nodes × HDDs = shared HDD pool OSDs
The SSD pool hosts anything latency-sensitive: Sentinel, Graylog, development VMs. The HDD pool handles everything that just needs capacity: backups, ISOs, archive data.
VLANs
The network is segmented into VLANs enforced by the OPNsense pair. Management, lab, home, and Ceph replication traffic are all separated so a compromise in one segment can't reach cluster management interfaces.
VLAN 10 — Management Proxmox hosts, OPNsense, IPMI VLAN 20 — Lab VM workloads, security tooling VLAN 30 — Home Household devices, Lauren's kit VLAN 40 — Ceph Storage replication traffic (MTU 9000, jumbo frames) VLAN 50 — DMZ Anything externally reachable VLAN 99 — Native/Trunk Uplink trunks to switching
Inter-VLAN routing enforced by OPNsense firewall rules. Default deny between segments. Explicit allows only.
DNS Resolution
Internal DNS runs on a PowerDNS HA pair (Authoritative + Recursor) across two VMs. Split-horizon: internal queries go to PowerDNS, external queries forward upstream. DNSSEC is on for internal zones.
Firewall High Availability
Two OPNsense instances run in active/passive HA using CARP for virtual IP failover and XMLRPC for configuration sync. Failover is sub-second, tested under load.
| Platform | OPNsense 24.x on dedicated VMs (PCI passthrough for NICs) |
| HA mode | Active/Passive CARP — VIP failover |
| Sync | XMLRPC configuration sync, pfsync for state table |
| Hosts | P-WBLV-MK5-PVE-01 (primary), P-WBLV-MK5-PVE-02 (secondary) |
| IDS/IPS | Suricata with ET Open ruleset |
| Logging | Syslog → Graylog for centralised log management |
Remote Access
No management interfaces are exposed to the internet. Remote access goes through Tailscale only: authenticated, encrypted tunnels with no inbound firewall rules or exposed ports needed.
Security Hardening
All VM workloads are hardened to CIS Level 2 baselines before deployment, using the same tooling and validation approach as the CNI programme work, with the same documentation standard.
- CIS Level 2 applied to Ubuntu 22.04 LTS base images using Ansible playbooks
- Compliance validated with CIS-CAT Assessor post-deployment
- SSH key-only authentication, root login disabled, fail2ban active
- Automatic security updates via unattended-upgrades
- AppArmor profiles enforced on all workloads
- Auditd configured per CIS requirements, logs forwarded to Graylog
- Separate service accounts per workload, no shared credentials
- Secrets managed via 1Password CLI with vault separation by privilege level
Tooling Stack
The lab runs a self-funded Microsoft 365 E5 tenant with Sentinel as the SIEM. Having my own Sentinel instance means I can do detection engineering and KQL development without touching client environments.
| SIEM | Microsoft Sentinel (M365 E5 tenant) |
| Log mgmt | Graylog — centralised collection before Sentinel forwarding |
| Automation | Mac Mini M2 (p-wblv-lab-aut-01) — Ansible, Claude Code, MCP servers |
| Remote | Tailscale mesh + subnet routing across all VLANs |
| Monitoring | Prometheus + Grafana for cluster and VM metrics |
| DNS | PowerDNS Authoritative + Recursor HA pair |
| Secrets | 1Password CLI with tiered vault structure |
Automation
A Mac Mini M2 (16GB RAM, 1TB SSD) handles Ansible playbook execution, hosts MCP servers for Claude Code integration, and serves as the SSH jump host for lab access.
Roadmap
- Out-of-band management expansion — Raspberry Pi serial console server for all five managed devices
- JetKVM deployment for the Mac Mini automation node
- Expand Sentinel detection library — port production KQL rules into the lab environment
- South Wales relocation — natural forcing function for a hardware refresh and potential SFF replacement evaluation