To recap our SharePoint testing adventure, we have determined what our users are going to do, we have established load criteria, and we have tested our farm changing only the number of users hitting the system to see when it breaks. The end result is a ton of test data that tells us a lot of information. What it doesn’t do well is show us out results in a clear and defined manner. Sure, we could take out raw data to the business users and say something to the effect of “Look here, as we added the 85th vUser, the % of Committed Bytes In Use went from 24 to 30. Just look at this table of raw data that is 100 pages long”. Yeah, that is going to work.
What we need is to pick out the relevant information that anyone can look at and see when the farm starts to break down and we need to add more hardware. In part III we boiled it down to three main measures:
- Requests Per Second
- Page Load Time (for a given transaction or a single page, usually the Landing Page)
- CPU Utilization
Now, what we need to do is extract those measures from our raw data and put them into a easy to consume tabular format.
Here I have set up a row for each test run and tracked the relevant data for each run. # of vUsers, Hits Per Second, Page Load Time for the Landing Page, CPU Utilization on the WFEs etc. Just looking at this data for these runs it is fairly clear to see that Hits Per Second track closely to number of vUsers until the farm hits about 150 vUsers (which from Part II we have learned will tell us what our expected user load is, in this case about 150,000 users). The HPS gets even flatter when we push to 200,000 users. We can also see that the load time for the default.aspx rises from a .58 second avg up to a 3.65 second average.
All of that data is good, but even this is not great for a presentation to management on farm performance. So we turn it into a graph. Here is an example of this graph for Hits Per Second. It is showing the farm being run with 2 WFEs and 1 WFE. The 2WFEs is to simulate a 3 WFE farm (remember that pesky N-1 standard from Part III). This graph gives us a quick an easy way to show management what performance of the farm looks like as we ramp up users.
This graph is easy to see that with 1WFE (2WFE farm) we cap out in these tests at about 50,000 user (50 HPS). 2WFEs (a 3WFE farm) is fine at 100,000 users, and starts to tail off at 150,000 users.
Here is a similar graph that shows CPU utilization for 1WFE and 2WFEs
Here we can easily see that the 1WFE farm is starting to show CPU strain at 50,000 users and that the 2WFE farm shows the same level at 100,000 users. This graph makes it easy to see that each WFE (in this usage scenario) will give us about 50 Hits Per Second or 50,000 users.
Try to remember that the goal of our testing was to show how well the farm performs. This meant that we first established what good performance meant, then we established metrics and criteria, then we ensured that our testing was consistent, and lastly we put it into an easy to use graph.
Its also important to remember that these numbers were for one specific farm and one specific testing scenario. If this farm changes in topology, and more importantly in usage or implementation, then the performance tests will likely need to be run again.
Its interesting that in other tests that I saw from Microsoft, load is 50 RPS/WFE in a collaborative environment and 100 RPS/WFE in a publishing environment. I would highly recommend extensive testing in this manner in your environment.