Infrastructure

WBLV Private Cloud Lab

Overview

A private cloud running on repurposed enterprise hardware, using the same software stacks and security controls I work with professionally.

The problem with only learning security and infrastructure on client environments is that you can't experiment. You can't upgrade a cluster node to see what breaks, run a new detection rule that might flood alerts, or test a firewall migration approach before committing to it in production.

The lab closes that gap. It runs the Proxmox hypervisor stack, OPNsense firewalls, and CIS hardening baselines I use at work. If I configure it here first, I understand it before it matters.

Philosophy: The lab runs real software, follows real security standards, and is documented properly. If it wouldn't pass a CAB submission, it doesn't go in.

Servers

The cluster runs on three identical nodes — repurposed enterprise servers sourced through a recycling contact. Each node runs Proxmox VE with Ceph for converged storage, giving a fully hyper-converged architecture with no external SAN requirement.

Motherboard	Supermicro X10DRI — dual socket LGA2011-v3
CPU	2× Intel Xeon E5-2680 v4 (14C/28T each, 56 threads total per node)
RAM	64GB DDR4 ECC per node (192GB cluster total)
Network	2× dual-port 10GbE RJ45 NIC + HPE 4-port 1GbE
Storage HBA	LSI 9300-16i — 16-port SAS3/SATA3
Ceph SSDs	3× Samsung PM1633a 1.6TB SAS SSD per node (9 total)
Ceph HDDs	Shared pool of 4TB and 10TB SAS HDDs across cluster
Boot	Dedicated SSD per node for Proxmox OS
Case	LC-4480 4U rackmount

Network Interfaces

Bond0 (LACP) — 2× 10GbE for VM traffic and Ceph replication
mgmt0 — 1GbE dedicated management interface
IPMI — out-of-band management, separate management VLAN

Cluster Configuration

All three nodes form a Proxmox VE cluster with quorum provided by the three-node Corosync ring. VMs can live-migrate between nodes without downtime. The cluster uses LACP-bonded 10GbE links for both VM traffic and Ceph replication, giving redundant high-bandwidth paths between every node.

Each node runs both the Proxmox hypervisor and Ceph OSD daemons. It's a hyper-converged setup (same idea as Nutanix or vSAN), which costs some compute overhead but removes the need for a separate storage layer.

Storage Architecture

Ceph runs converged on all three nodes, so storage OSDs share the same hosts as compute. You lose some CPU to Ceph overhead, but there's no separate storage infrastructure to manage.

Pools & Replication

CLUSTER STORAGE POOLS

vm-pool-ssd ~5.5TB usable Samsung PM1633a SSDs Replication factor: 3 → VM workloads, Sentinel lab, high-IOPS services

vm-pool-hdd ~24TB usable SAS HDDs Replication factor: 3 → Backup storage, archives, bulk data, cold workloads

3 OSD nodes × 6 SSDs = 18 OSD daemons (SSD pool) 3 OSD nodes × HDDs = shared HDD pool OSDs

The SSD pool hosts anything latency-sensitive: Sentinel, Graylog, development VMs. The HDD pool handles everything that just needs capacity: backups, ISOs, archive data.

VLANs

The network is segmented into VLANs enforced by the OPNsense pair. Management, lab, home, and Ceph replication traffic are all separated so a compromise in one segment can't reach cluster management interfaces.

VLAN LAYOUT

VLAN 10 — Management Proxmox hosts, OPNsense, IPMI VLAN 20 — Lab VM workloads, security tooling VLAN 30 — Home Household devices, Lauren's kit VLAN 40 — Ceph Storage replication traffic (MTU 9000, jumbo frames) VLAN 50 — DMZ Anything externally reachable VLAN 99 — Native/Trunk Uplink trunks to switching

Inter-VLAN routing enforced by OPNsense firewall rules. Default deny between segments. Explicit allows only.

DNS Resolution

Internal DNS runs on a PowerDNS HA pair (Authoritative + Recursor) across two VMs. Split-horizon: internal queries go to PowerDNS, external queries forward upstream. DNSSEC is on for internal zones.

Firewall High Availability

Two OPNsense instances run in active/passive HA using CARP for virtual IP failover and XMLRPC for configuration sync. Failover is sub-second, tested under load.

Platform	OPNsense 24.x on dedicated VMs (PCI passthrough for NICs)
HA mode	Active/Passive CARP — VIP failover
Sync	XMLRPC configuration sync, pfsync for state table
Hosts	P-WBLV-MK5-PVE-01 (primary), P-WBLV-MK5-PVE-02 (secondary)
IDS/IPS	Suricata with ET Open ruleset
Logging	Syslog → Graylog for centralised log management

Remote Access

No management interfaces are exposed to the internet. Remote access goes through Tailscale only: authenticated, encrypted tunnels with no inbound firewall rules or exposed ports needed.

Security Hardening

All VM workloads are hardened to CIS Level 2 baselines before deployment, using the same tooling and validation approach as the CNI programme work, with the same documentation standard.

CIS Level 2 applied to Ubuntu 22.04 LTS base images using Ansible playbooks
Compliance validated with CIS-CAT Assessor post-deployment
SSH key-only authentication, root login disabled, fail2ban active
Automatic security updates via unattended-upgrades
AppArmor profiles enforced on all workloads
Auditd configured per CIS requirements, logs forwarded to Graylog
Separate service accounts per workload, no shared credentials
Secrets managed via 1Password CLI with vault separation by privilege level

Tooling Stack

The lab runs a self-funded Microsoft 365 E5 tenant with Sentinel as the SIEM. Having my own Sentinel instance means I can do detection engineering and KQL development without touching client environments.

SIEM	Microsoft Sentinel (M365 E5 tenant)
Log mgmt	Graylog — centralised collection before Sentinel forwarding
Automation	Mac Mini M2 (p-wblv-lab-aut-01) — Ansible, Claude Code, MCP servers
Remote	Tailscale mesh + subnet routing across all VLANs
Monitoring	Prometheus + Grafana for cluster and VM metrics
DNS	PowerDNS Authoritative + Recursor HA pair
Secrets	1Password CLI with tiered vault structure

Automation

A Mac Mini M2 (16GB RAM, 1TB SSD) handles Ansible playbook execution, hosts MCP servers for Claude Code integration, and serves as the SSH jump host for lab access.

MCP integration: Four MCP servers run on the automation node — filesystem, proxmox, obsidian, and github — allowing Claude Code to interact directly with lab infrastructure. All build documentation lives in Obsidian and is accessible via MCP during active sessions.

Roadmap

Out-of-band management expansion — Raspberry Pi serial console server for all five managed devices
JetKVM deployment for the Mac Mini automation node
Expand Sentinel detection library — port production KQL rules into the lab environment
South Wales relocation — natural forcing function for a hardware refresh and potential SFF replacement evaluation

↗ View on GitHub