DIY cloud gaming setup with VFIO, Parsec and AMD
This is a follow-up to my previous post where I covered the VFIO setup in general. For many people that would have been good enough, but my goal with this setup was to have a powerful gaming setup that I could access from my living room PC with 20 meters of Ethernet cables between the two.
The concept of cloud gaming has become more popular lately, with Microsoft, Google, NVIDIA and others offering services that allow you to stream games to your device. These setups usually include a powerful server in a datacenter rendering the game and sending the compressed video stream to your device. This works surprisingly well, but due to high cost, poor internet connectivity in many places in the world and licensing issues , this setup might not work well for everyone.
However, if you already own a moderately powerful gaming PC and would like to use it to play games you already own over the network, then there are plenty of solutions that can get you there.
Here are some options:
- Steam remote play: works for games launched via Steam.
- Parsec: able to stream the whole desktop, including games.
- Moonlight: same, but only usable on hosts with NVIDIA GPU-s due to the implementation relying on NVIDIA’s GameStream protocol.
In the past, I’ve had OK results with Steam remote play, but the limiting factor has been the reliance on Steam. I have bought a couple of games from GOG as well, which I could probably also play remotely by launching them via Steam, I just don’t want to go through the hassle.
For this setup I’ve opted to go with Parsec. It’s not perfect, but it’s still good enough for our purposes. The GPU we’re using (AMD Radeon RX 570) also limits our options because it won’t work with Moonlight.
The installation of Parsec is pretty straightforward: just install it on your host and client machines. In my case the client machine is a Lenovo ThinkPad X230 that was brought back from the dead . It’s not powerful, but at least it can do H.264 hardware decoding and it uses only 12W of power when idle, making it a perfect candidate for testing this out. It’s also worth mentioning that both machines are connected on the local network using an Ethernet connection to avoid Wi-Fi becoming a bottleneck.
VFIO, gaming and you
What turned out to be a bigger hurdle was the performance of games inside the VM. After getting everything running with the VFIO setup, I didn’t really spend any time trying to optimize the setup for the smoothest experience. When trying to actually run this setup, I ran into quite a few problems with performance which mainly manifested as stutters and unusually low framerates. Turns out that gaming has stricter latency requirements than other server workloads.
Thanks to the great guide in Arch Wiki, I could get a lot of ideas on what to try out to improve the performance of this gaming virtual machine.
Before doing anything, I ran some benchmarks to get a sense of what performance levels I’m dealing with. To test out each individual change, I used GTA V as the gaming benchmark since it was pretty good at pointing out any performance issues.
Here’s a list of things that I ended up doing:
- setting the CPU model to
- isolating CPU-s dynamically whenever the VM starts and pinning those CPU-s to the VM to avoid the host OS and other VM-s from using those cores
- setting the CPU governor to performance on isolated cores to rule out issues with the CPU not clocking high enough and
switching between idle/load power
echo performance > /sys/devices/system/cpu/cpu[4-7]/cpufreq/scaling_governor
- enabling static hugepages to rule out issues stemming from poor memory access speeds
- disabling SMT in UEFI settings to get rid of one additional variable
- getting a 64 GB DDR4-3600 memory kit to allocate more RAM to the VM (16 GB) and leave enough for the host machine and other services
The golden rule of troubleshooting is to change one variable at a time and comparing results. Not every change that I made ended up being a positive one. For example, when configuring the VM to use 4 cores and 8 threads instead of plain 4 cores, I saw the framerates in GTA V drop 50%. I assume that the VM treated the “SMT cores” as real ones, causing the Windows scheduler to make incorrect decisions.
After playing around with this setup and accumulating these tweaks, I managed to get rid of most of the issues that bothered me during gaming, resulting in a much smoother experience. It finally felt like the gaming VM behaved like a machine with a 4 core CPU, 16 GB of RAM and an AMD RX 570 inside it.
At this point I’d consider the setup to be fantastic for someone who wants to play games with a VFIO setup. Since I was using this setup over the network with Parsec, I soon ran into more issues, but this time with the GPU.
Parsec and other similar solutions rely on the host GPU to encode the image into a video stream using the encoder present on the GPU die itself. You might see this being referred to as hardware encoding in settings. What I didn’t know before trying this setup is that AMD GPU encoders have a reputation for just being plain bad:
If the host has an AMD card, AMD is usually known for having worse encode than NVIDIA and even Intel. You should be fine at low resolutions. If all guests support H.265 and have it turned on, you may see better performance than with it off
And this shows. When trying to stream at 1080p, the result was a stuttery and inconsistent mess. Imagine trying to play at around 30fps with the frame timings graph resembling a heart rate monitor, that’s what it felt like. At 720p, the experience was so much smoother. Yes, the image quality suffers because of that, but at least it was mostly playable.
I decided to also try out if changing the codec from H.264 to H.265 has a significant impact. I did a quick test in Dirt Rally due to it having a benchmark loop mode and checked the statistics that Parsec shows. With H.264, I saw encode latency of about 10ms. With H.265, this latency was now at 8ms. Still not great, but it is technically a ~20% improvement. The downside of this for my setup is that the client laptop simply does not support hardware accelerated H.265 decode, which is something that Intel integrated graphics received support for in 7th gen CPU-s.
The GPU seems to also cause some trouble for Parsec, as it would fail to connect occasionally and reports a host encoder issue. This is usually overcome by rebooting the VM, which can get quite annoying after a couple of times. At other times, Parsec sometimes just froze during gameplay and caused the client to freeze at 100% CPU usage. Not what I’d call smooth sailing.
Another issue that I noticed by using MSI Afterburner is that the framerate was still not that stable in some games, such as GTA V. After all the fixes and tweaks, there were still small stutters, even with vsync enabled. I decided to look over AMD Radeon settings to see if a driver feature had an unintended side effect. I had picked the Gaming preset because that was what I was using this GPU for, but decided to switch to the Standard mode instead. And just like that, the stutters were gone! I suspect that the Radeon Anti-Lag feature might have been the cause to this, as that was one of the main settings that was disabled after switching to the Standard settings preset.
Is it worth it?
I like a technical challenge and going through all the attempts to get more and more performance out of this setup was interesting for me. However, it might not be the same way for everyone else. If you still feel like you want to go through this and learn something along the way, then feel free to use this as a guide on what you can try. Make sure to also read up on experiences that others have had with these tweaks and always measure your results to see if they had the intended effect.
For those who just want to play games and not worry too much about getting the expected performance out of your machine, I’d still recommend going with building a separate gaming PC.
If I had to build a machine specifically for this type of workload, then I’d make these changes:
- Replace the CPU with a non-APU model, such as the Ryzen 9 5950X. Due to the physical core layout you could assign one CPU core complex to the gaming VM and leave everything else to the host. These CPU-s also have a lot of L3 cache, which should help in workloads that require low latency, such as games.
- If you’re going for a streaming setup, then I’d try it first with an NVIDIA GPU. The error code 43 issues are gone now, making it a viable option with solutions like Moonlight.
- More SATA or M.2 slots on the system. This makes passing through storage devices so much easier.
I’m not planning on stopping this adventure just yet. I’ve recently looked around for an NVIDIA GPU to do a comparison against the AMD RX 570 and picked out an NVIDIA GTX 1060, which is from the same era and performance bracket. Should make for an interesting comparison in both Parsec vs Parsec and Parsec vs Moonlight scenarios.
I’m also hoping to eventually move to a more modern client PC that can support H.265 decoding and output at higher resolutions. Yes, 4K at 60Hz is demanding for a laptop from 2012. Given that the laptop is an older ThinkPad, I expect that to happen after 2025.
Regarding the storage setup, I’ve got input from a friend saying that I might want to try setting up a Samba share on my “NAS” VM and host my Steam library and other game files on that. After all, the virtual LAN has managed hit 2-3 Gbit/s in my testing and with L2ARC being persistent in ZFS 2.0, I might be able to take advantage of that as well ( assuming that I even need L2ARC, ARC efficiency is pretty good!). Currently I just have a Syncthing sync set up between the “NAS” VM and the gaming VM so that there exists at least a basic backup of all my games.
If you prefer to share your thoughts on this post privately, just send me an e-mail!