I first documented my training lab a few years ago and then discussed it again when talking about how I set up Proxmox with Cloudflare Tunnels, but it’s been a while since I’ve done any major renovations to the lab environment. In July 2021, I was in charge of training new people in my organization of roughly 500. We would get new folks in all the time, and the level of experience/knowledge was all over the board so I adapted the training environment I used for Cyber Patriot to facilitate training for my work.
Now I’ve been on record saying that I’m a huge fan of Proxmox, a statement I continue to say, but as much as I love Proxmox, I have to acknowledge that its UI is anything but newcomer friendly. The Proxmox UI is complicated and, quite frankly, daunting if you’re not working with the system daily.
When I built the training labs years ago, I barely factored UX. Students would access the Proxmox server directly and interact with its UI to manipulate and control the VMs. This was less than an ideal solution, and I would spend the first 30 minutes of training sessions explaining and troubleshooting with students how to use Proxmox.
I was eventually promoted out of my training role, and the new training manager didn’t want to invest time in hands-on instructor-lead training; they instead felt like a corporate subscription to Udemy would be a better solution, so I wound up taking my training equipment with me where it sat mostly unused for about 18 months. I was recently asked to put on an introduction to cyber operations training for six people, and that kick-started the week-long overhaul of my training network.
When I started preparing for this training session, I thought about the issues I had with running training in the past, mainly the UX of using Proxmox, but there were some other issues I’d dealt with previously. The problems I had identified essentially boiled down into a few key categories;
- The UI of Proxmox is overly complicated and confusing, making the UX of powering, launching, and resetting a VM cumbersome.
- Large word documents are not ideal training guides. Formatting them for pictures, commands, scripts, and steps works but could be done better.
- Remembering several links for things like Shared Drives, Proxmox, Virtual Whiteboard, Meeting link, etc could be simplified.
- In the past, some students have messed with other students by hacking other students; Student networks should be logically separated.
With these problems outlined, I got to work.
Fixing the UX of interacting with the VM
The simplest way to fix the Proxmox UX issues is to not use Proxmox for the actual interaction with the VMs. In one of my former jobs, we used Citrix Workspaces to quickly launch pre-allocated VMs as needed. If you’ve never seen the Citrix workstation, it looks similar to this:
Applications and Desktops that you have access to appear there.
There were two problems that I encountered when looking into this. The first is that Citrix workspace requires the Xen Hypervisor, which would mean I’d have to convert all my stuff over (not happening), and second; Citrix Workstation is a paid product. So I needed something else.
I’ve mentioned Guacamole before in my blog post about making wordlesolver available to the world. While that was an unconventional use of Guacamole, it made me set up a Guacamole server already. Now that my Wordlesolver is shut down, I can set up a Guacamole server and not do anything.
When it finally got set up, the end result looked like this:
However, getting here was a bit of a headache.
Enabling remote access
I had two main ways of remote access; I could use RDP or VNC to access the desktops. RDP was my preferred method since VNC does not natively support sound; however, I ran into some issues with this.
First, my Windows 7 instances were Windows 7 Home, not Pro. In the Windows 7 era, RDP was only available on the Pro and upline. So I had to use VNC.
Secondly, setting up xrdp on Kali is a pain, and I didn’t want to deal with that when it already has tightvncserver installed by default.
I was going to go with tightvnc for all three machines for each student, but it doesn’t run on Windows XP. In the end, it wound up with XP using RDP and Windows 7 and Kali using VNC, but to the end user, it’s transparent.
They click the icon, and the desktop launches in their browser.
Fixing VNC in Kali
VNC in Kali had an issue where it would load a purely grey screen with an X for the cursor. The following code in the /home/kali/.vnc/xstarup fixed that issue.
#!/bin/sh unset SESSION_MANAGER unset DBUS_SESSION_BUS_ADDRESS startxfce4 & [ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup [ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources xsetroot -solid grey vncconfig -iconic &
Rolling Back VMs
One feature that every good training environment needs is the ability for students to reset a virtual machine if they accidentally (or intentionally) do something which could break it. Proxmox has this capability built in with its qm snapshot menu, but unfortunately, Guacamole does not have an easy way to interact with Proxmox UI directly. I wanted to allow the students to roll back VMs without needing to go to the Proxmox UI.
As I demonstrated with the Wordlesolver blog post you can get guacamole to execute a script over SSH and lock into that script. So I used this. I created 6 scripts on Proxmox, 1 for each student, and then an SSH connection to each script. The scripts are pretty basic;
#!/bin/bash echo "Which VM do you want to reset" echo "Kali = 1" echo "Windows XP = 2" echo "Windows 7 = 3" echo "All = 4" read vmnumber if [ $vmnumber = 1 ]; then qm rollback [VM ID] [snapshotname] elif [ $vmnumber = 2 ]; then qm rollback [VM ID] [snapshotname] elif [ $vmnumber = 3 ]; then qm rollback [VM ID] [snapshotname] elif [ $vmnumber = 4 ]; then qm rollback [VM ID] [snapshotname] qm rollback [VM ID] [snapshotname] qm rollback [VM ID] [snapshotname] fi echo "VM Rollback in progress. Please be paitent this process may take up to 5 minutes"
This basic script enables users to quickly reset their VM instance from the Guacamole instance without needing to dive into the Proxmox interface.
When the user clicks on the reset button on the Guacamole connection screen, the system will make an SSH connection over to the Proxmox server and execute that user’s reset script. This script will then ask the user which VM number to reset and run the appropriate qm command to roll back that snapshot.
Again it’s not a super elegant solution, but it works.
Load Balancing Guacamole
In most of my instances using Guacamole, it was either single-user or light applications like SSH. I haven’t experimented with multiple Desktop connections at the same time as this. Six users with three potential connections each means 18 possible desktops streaming from one Guacamole server simultaneously, something I’ve never tested before.
I bumped the hardware in my Guacamole VM up to 8GB or RAM and 4 Cores to hopefully handle the connection, but I wasn’t sure, so I took some additional steps. I cloned the Guacamole instance and then used my haProxy in pfSense to balance the connection between the two with a round-robin connection.
The load balancing might be overkill, but it did allow me to experiment with this in haProxy.
Now with a new user interface for accessing VMs, easy rollback without navigating Proxmox menus, and load balancing in place, it was time to fix the training guide.
Fixing the training guide
My previous guide was built on a large word document with pictures and some text. It worked, but it was not ideal. I maintained a Bookstack wiki to document my work toward my Masters in Cybersecurity. It’s sat fairly dormant since I created it, as most of my Masters degree has been theoretical instead of practical.
However, I can convert this Bookstack wiki into a training guide.
Since this training plan follows the MITRE Killchain, I broke the training guide into the RWDECEM that MITRE outlines.
I then further subdivided this down into individual tactics in the ATT&CK matrix.
Inside each book are chapters for techniques and then pages of information about what we’re going to do and breaking out the brief description of how it works.
I debated back and forth on whether Bookstack was better than just a regular word document when you have people going through a set of steps and ultimately decided I would stick with it, but I did note a few issues.
- Looks significantly better than a word document
- Search is more function
- Breaks up sections into individual pieces rather than a large document
- Code Segments make identifying and reading code easier.
- Does not move between books automatically
- Users must click back to the shelf and then click on the new appropriate book.
- Does not move between books automatically
Fixing the link issue
This is not a big issue, as I usually add all members to a Google Group and then send emails to that group. The emails will contain links to all the services, but I wanted to see if I could simplify them even more.
Cloudflare app launcher
I’ve talked a lot about how much I like Cloudflare and how I use their FREE Access Service for many things, including these labs. All students who go through my training are added to a Cloudflare Access group and granted access to my lab environment. In addition to that, they are granted access to the Cloudflare App launcher.
Now the students only have to go to one website, and they get access to a set of clickable buttons that can take them to the service they need.
Separating the students logically
When I’ve run this course in the past, I’ve encountered occasions where a student “hacks” a different student. Most of the time, this hacking is simply SSHing into a different student’s Kali box and leaving random notes. Sometimes they’ve gone as far as writing cron scripts to randomly shut off the Kali machine. These things are usually done in good humor and are usually harmless. The VMs are all snapshots in a clean state, so any messing around like that is usually solved by a simple rollback.
Nevertheless, I’ve found over the years that even this type of harmless joke can be detrimental to the learning of more shy students. I’ve had students in the past who were too shy to ask for help with a machine that is constantly restarting. And the less technically savvy the user, the less likely they are to reach out for help with a system constantly restarting.
This class was going to be roughly two-thirds untechnical and one-third technical people so I really wanted to make sure that the technical folks didn’t do anything to mess with the other students.
Creating the VLANs
I’ve discussed VLANS before in my posts, including how I use VLANs for my analysis networks. I’ll be expanding that network further for this training network. First, I went and set up a specific VLAN for each student;
Next, in Proxmox, I went to each VM’s interface and modified it with the specific VLAN tag, then had each VM release and renew its DHCP connection. I created the firewall rules to block any communications between RFC1918 IPs and tested them to ensure they worked. Something interesting I learned is that in pfSense, I can create an interface group for all the VMs and then create a singular firewall rule for that interface group that applies to all the interfaces rather than creating six different firewall rules. So that was neat.
I had to set static reservations for each device in the DHCP server to make the training more seamless. You might have noticed that I had each VM pull a DHCP reservation and then assign it a static IP, which, if you’re familiar with pfSense, you know, doesn’t work. I did this mainly just because it was simpler to have the MAC address already stored in pfSense and then just click the button to assign that MAC address a different static IP. The alternative would have been to write down all the MAC addresses and manually enter them into pfSense, which was less ideal. However, the downside to how I did this is that now if I add something to that VLAN, it starts at the next incremental IP from the last DHCP lease. If my DHCP pool goes from 10.10.11.10-10.10.11.255 and my VMs originally has 10.10.11.12, then the next VM will get the IP of .13 even if .10-.12 are no longer in use. It’s not a big issue, but it does leave weird gaps in the IP space when putting things on a map.
I had to make a firewall rule to allow my Guacamole server to access all VLANs but not backward. I didn’t want a student to try and go back into the Guacamole server. It shouldn’t be possible, but you can never be too careful.
Allowing Kali updates
Finally, I wanted to ensure that the students could pull updates and new tools if they wanted to. The lab technically works just fine in the state it’s in, but if they need to update a tool or download something else for practice, I wanted to ensure they could. So for this purpose, I enabled very limited access in the firewall for the Kali machines to pull updates from kali.org. This enabled the apt command to function so students could run apt update and apt upgrade.
Bringing it all together
In the end, the lab looks like this:
It’s still a pretty basic lab focused on just teaching the basics of hacking. The lab focuses on using Metasploit, following the MITRE kill chain, thinking like a hacker, and using tools to gain access.
One of the biggest questions I get asked when I run training like this is, “Why XP?”. Why do I still run training labs using Windows XP as the target when Windows 10 is the norm? I do this for a few reasons;
First, the fundamentals of hacking haven’t changed that much. The kill chain is still a foundational bit of knowledge, and using Windows XP still gives us the ability to walk through that kill chain. Regardless of what target an adversary is going after, they generally follow an established flow that starts with reconnaissance and ends with maintenance. A lab like this allows us to walk through these steps in a controlled environment.
Second, Windows XP is still running in many embedded systems worldwide. Devices from ICS/SCADA devices, ATMs, and Point of Sale systems typically run Windows XP (or older if you’re Southwest). Even outside of these devices people often romanticize Windows XP, and I still know at least one person who adamantly refuses to move away from it. I consider it valuable for my students to understand how important it is to update and move on from devices when they reach EoL.
This was a pretty major update for a relatively simple lab exercise. I love doing projects like this because I always assume it’ll be a simple update and it never is. All the issues, oddities, and design problems I run into when doing things like this help me new topics in cybersecurity and networking.