I have had the opportunity recently to spend some time on building out and helping customers with their Windows Virtual Desktop journey and wanted to share my findings in troubleshooting WVD performance.
Previously with on-premises Virtual Desktop Infrastructure environments, it has always been crucial to optimise images because of the variances in physical to virtual environments such as limiting background services and reducing running processes to the bare minimum which have no major benefit in a virtual desktop setting, a typical example would be to minimise graphic effects and redraws. In the world of WVD, it can provide additional benefits such as increasing user density which should lower the overall running costs of the solution.
When it come’s to troubleshooting a performance issue the focus should be finding the bottleneck constraining the system. Fortunately there are number of performance tools that can help here:
- Good old Task Manager in the Windows server or client operating system.
- Resource monitor in the Windows server or client operating system.
- Azure Monitor
- Ping or PingPlotter
- Speedtest.net (bone of contention this one, read more to find out why.)
In this post I’ll focus on the core components: CPU, Memory, Disk and Network and the impact in a WVD environment.
CPU demand on a WVD session host will vary from 0 to 100%. You can use Task Manager to monitor this. If it’s over 75% it likely warrants further investigation. It’s beneficial to use the details tab to understand the processes which are consuming the CPU and if a process (other than system idle process) is at the top of the list most of the time, it’s worth determining if this is expected or if the process is having an issue. If it’s the latter it may need to be killed. You can also identify which process is running against which user.
If after investigation you find you have CPU contention, you may need to resize your session hosts or tweak the load balancing algorithm to better spread user density with session hosts.
Another check to perform would be to check the All Users Startup folder, removing unnecessary start-up items that launch with every user login and needlessly consume CPU. For every user who uses a session host, this will occur for every instance. Over multiple users, this starts to become a problem.
On WVD session hosts RAM is primarily consumed by applications that run within user’s sessions. It can become an issue once many applications are open all competing for RAM. Once all RAM on a session host is consumed it is forced to use paging which is never a good thing and will serve a big penalty when it come’s to performance for all users connected to that host.
Checks to perform in relation to memory would be to look for hard page faults. This happens when a memory page that an application expects to find in RAM is unavailable and the page has been moved to the page file on disk. Bursts or constant hard fault activity indicates a performance issue.
To help identify the process causing faults:
- Reduce the load on the session host.
- Instruct users to close any applications not actively in use and educate users to log off their desktop session once finished. This process can also be automated after periods of user inactivity.
- If it’s consistent across session hosts, check for a memory leak in the applications served to those hosts.
- An admin could resize the session host to factor in more RAM such as moving from a D series to E Series as these are memory optimised. To help manage cost a good suggestion would be to double the amount of available RAM whilst keeping the number of CPU cores constant.
During the process of building a session host to deploy to a host pool a default OS drive will be created. It’s possible to add further disks to the VM however the best practise approach would be to use FSLogix profile containers, a service acquired by Microsoft which mounts across the network and the files are stored outside of the session host but to the user it provides the experience that all files are local.
It stores a computer user profile in a single container and at sign-in, the container is dynamically attached. This works fundamentally different to previous user profile technologies used in a VDI environment which all came with various challenges, no previous solution was able to handle all the user profile requirements such as handling of large Outlook OST files. This problem, however, is addressed with FSLogix and if you wish to use OneDrive in WVD it’s mandatory to use this technology in a multi-session or non-persistent scenario.
When it comes to disk related performance issues, the first is the OS disk of the VM. It’s important when factoring in the sizing and workload of the intended size that IOPs and throughput are considered and these details are shown when provisioning the VM on the host pool.
In relation to FSLogix using the wrong storage account can cause performance issues with profile disks. Here are some considerations:
- At a minimum P-type premium SSD storage should be used such as P15 or higher to ensure there is enough IOPs to serve the user experience. This is especially important in multi-session scenarios.
- In FSLogix ensure the option to index search cache data is enabled and stored to the container, this takes effect during a rebuild and when lots of users are logging into a session host at one time as the index will be built every time.
- Prevent streaming services being used in WVD and instead educate users to do their streaming locally. There are optimisations available for Microsoft Teams to ensure this is the case.
- When troubleshooting disk related issues use resource monitor on the session host and check disk queue length. A high queue length value which is constantly over 1 is an indication that the OS disk is disk constraint.
Placement of session hosts is important and any latency where a user is located and where the underlying session hosts are served will create a poor end-user experience. Fortunately, there is a tool available from Microsoft called the ‘Windows Virtual Desktop Experience Estimator’ which should guide the correct region to be used when deploying virtual machines. It calculates the round trip time(ms) based on user location. It is an estimate so do bear this in mind but I find it a good starting point.
If you have a WVD environment already in situ, considerations to account for when troubleshooting the network would be as follows:
A common recommendation I have seen is to use speed test tools to rule out the network as cause of issues of slowness however it’s important to consider that they don’t consider the full network design. A speed test will tell you how much bandwidth there is at a point in time, but it says nothing about the latency and packet loss of the connection between the end-user device and the session hosts in Azure.
For latency and overall experience in a session host a good community tool exists called the ‘Connection experience indicator’. Found here: https://bit.ly/2RrQTd3
Packet loss for certain users could be because of a certain setup in their home such as the use of splitters. Running a continuous ping and reviewing any packet loss would be advisable in this scenario. If from the user side all is well then it be worth checking the Azure health status page to ensure there are no issues at a region or global level in Azure. Redeploying the VM to have it forcefully move to another physical host in Azure may help or stopping the VM and starting it again as this will force the use of another physical host.