Compute – Brad Gagnon

On premise, in cloud – compute is compute.

Features are what drives compute decisions.

Compute Experience Overview

My compute experience varies widely, but if there is one or two things that I’ve come to, it’s all about crunching numbers; compute is now a commodity (except in fringe cases such as wanting to do ARM64 / AARCH64 computing; microservers).

On Prem Hardware Specific Experience

Intel, AMD, SPARC, ARM64
All forms of memory (DIMM, NVDIMM, NVRAM and JEDEC Standards)
Networks / NICs (Intel, Broadcom, Emulex, etc…)
Desktops / Thin Client / EUC
Smart NICs / DPUs (Pensando)
All forms of Non-Volatile Storage (HDD, SSD, SED, NVME)
HPE DL, ML, BL, Simplivity and Synergy
HPE OneView (and it’s management ecosystem – OneView Global Dashboard, ILO Amplifier, etc)
HPE SuperDome
Dell R, M, VRTX, M1000 and MX Modular
Dell OpenManage
Cisco UCS / HyperFlex
Cisco UCS Manager
Nutanix

OS / Hypervisor Experience

VMware (vCenter, ESXi, Horizon, VMware Aria Suite, etc…)
HyperV
Windows Server 2003+ (2003, 2008, 2008 R2, 2012, 2012 R2, 2016, 2019, 2021)
Red Hat Linux (RHEL 4 – 9)
Red Hat “Clones” (CentOS, Rocky Linux, Alma Linux)
Amazon Linux / Amazon Linux 2
Debian and derivatives (Ubuntu, Mint, etc…)
KVM (Kernel-based Virtual Machine)
QEMU

Real World Experience – Example

Recently, AMD has been taking the processor market by storm with their high core count, high speed (Ghz) and relatively low price compared to Intel. AMD Epyc has been making waves because it can drastically reduce the compute footprint by packing quite a bit more power in to the same space.

With this in mind, I’ve been investigating migrating Intel based workloads to AMD. There are complications with this, namely OS support, the processor extensions (such as AVX-512 or AES-NI), whether to use single processor or multi-processor based systems and what it would look like from a performance perspective.

This investigation and research first needs a use case – we do things because we must, not because we can – the use case here is less power utilization, more performance and ultimately less dollars spent.

Researching these next needed a proof of concept. We ended up procuring a couple Epyc based servers to test with in our Lab environment. The Lab environment runs almost 100% like for like to production, other than utilization is less; it’s a good proving ground for new technology.

At the point I’d gotten to, I had seen a fairly substantial difference in raw speed, and that’s not with adding more vCPUs to VMs, which will be much less constrained.

Ultimately, I ended up pushing this idea for our next compute refresh. The issue that you’ll run into in a virtualized environment is that all hosts in a given cluster should be the same make / model of server (and that includes processor speed, generation, RAM, etc…). This is a best case scenario; I’ve made mixed clusters work before, but it’s typically temporary.

We reached out to our standard vendors (VARs) and received some expected lead times of various equipment; some equipment had a six month plus lead time. We determined that the items with the longest lead time needed to be ordered first as we were unsure when it would arrive and the implementation project really couldn’t kick off until we had it.

We planned on migrating some large workloads to AMD-based systems by the end of the year.