QA Engineering Dev Blog - Empyrean Age
Since my last blog entry, there has been some awesome work done in QA Engineering. This is the work of many people at CCP, most in the Quality Assurance department, but we have also received great assistance from the programmers and operations guys.
Empyrean Age introduced a large number of new objects and system changes. Some of these are huge and would require a herculean effort to manually test. QA Engineering was asked to step in and script as much as possible to allow the QA Testers to focus on testing functionality. CCP Atropos took the lead in this and below are some of the cool new testing tools we have available as a result of his work.
Many of our new tools were never previously needed, since the data they would test had been tested by the Bug Hunters and the EVE player base; for example, how often do Stargates change where they are going to? In earlier expansions, changes to these systems were made on a very small and limited scale, so testing time was short and simple. With Empyrean Age, though, that was all changed.
Automatic Gate Verification. With the addition of Black Rise we encountered some new testing issues that we had never faced on this scale. We added a large number of Stargates and these had to be checked to make sure they went where we wanted them to go, and that they were properly paired. How it works: the test would draw up a list of systems in the current region and break them down into constellations. The test could then be invoked on a constellation or region basis. Once a list of candidate systems was compiled, it would transfer itself into the first system, then navigate to the first gate, and jump through. On the other side of the gate, it would check to ensure that the destination system was correct and that it was within range of the correct return gate. Once that returned a positive or negative result, it would log the response, transfer back to the previous system, and proceed to check the next Stargate. And so on, and so forth, loading systems up as needed, since many would be offline. This meant that we could get 6 testing machines, assign each to test a different constellation in Black Rise, and complete the entire test in about 45 minutes. This also meant that it was repeatable on demand. To manually test this would have taken far too much time and would have meant that it was not easily repeatable.
The result of this was that we quickly identified 6 gates within Black Rise that were broken and needed fixing. This fix was applied and then, rather than just test the 6 fixed gates, we reran the entire test again. We also reran the entire set of tests on Tranquility while we had the server in testing mode to confirm that the gates in Black Rise were all working correctly after deploying the Empyrean Age updates.
Ice and Asteroid Belts. With the success of the Stargate testing, we wanted to apply similarly simple scripted tests to check other stellar data. We wrote a new script to get a ship to check the contents of a belt. This then gave us a report of what was in each belt and a warning if an Asteroid Belt had Ice or Ice Belts had Asteroids. As a result, we found that there were no Ice belts in any system in Black Rise, but there were several Asteroid Belts with Ice. The actual fault was that the celestial record was wrong and needed to be fixed (Ice Belts being marked as Asteroid Belts). We then reran the entire test to verify the fix.
Loot Table Changes. There were a lot of changes in Empyrean Age around the loot tables; these needed testing. To do this manually would have been a huge job (there are almost 4,000 different entities in the database in various states of design and use), so the decision was made to script a large amount, then select specific NPC entities for manual testing. A simple script was created that would spawn a quantity of a particular NPC from the list, then destroy them and check whether the wrecks did or did not contain loot, in line with the designed drop rate for that NPC. Some entities were harder to test, such as Overseer Structures (since they are massive and have rules determining how many can be deployed on a grid), but we could still check that we were getting some results. It was also discovered that some NPCs simply had no loot – they generated no wreck, no containers, no loot, nothing! After checking these results with the EVE Game Design Group, we determined that this was as intended for some but not all NPCs. On the other hand, some NPCs had no wreck type associated with them, meaning that they simply spawned a cargo container with the corresponding loot inside it – again, some by design, some not. This allowed us to identify a number of issues that were fixed and then the entire test was rerun.
Capture Point Bunkers. Another new feature that was introduced with Empyrean Age was the Capture Point Bunkers, and these not only needed to be tested for functionality, but to make sure they were in the right solar systems and not in other systems. We had scripts jump to all the Factional Warfare systems, test for the existence of a Bunker, and verify that it was within a certain distance of the Sun. We also had a script that checked that there were no bunkers outside of factional warfare systems. This test took so long to execute, we set the tests running and then came back later to review the results.
These scripted tests allowed us to test positively, for the presence of a particular variable or effect, and negatively, to ensure that something did not appear. This was the case with Capture Points which should only exist in systems flagged as being part of Factional Warfare and nowhere else.
Why do this through the client? All of the data we were testing was present in the database, but if it was not being delivered to the user in the intended manner, then the data becomes useless. We decided to test it through the client to confirm that a particular feature actually worked through the client.
Log Server Changes. It was decided that use of the client side Log Server had gone far beyond its original use as a debugging tool and was now being used to provide intelligence and scripting hooks. After much debate it was decided that these tools outweighed the advantage gained in testing from having complete client logs and changes were made to how the EVE client reports information to Log Server. What this also means is that player supplied Logs are not as useful as they once were and that the CCP QA members or the ISD Bug Hunters now have to spend extra time duplicating an issue and collecting complete client logs. This will have an impact on resolution time for these issues. We are not happy with this but it is a fact of the changes. Please note that these changes only affect the logs captured client side, not server side, so this won’t affect chat or combat logs. Once identified these changes were completed by the EVE Software Group, specifically CCP Laurelle.
Startup Tests. You know those "Not Accepting Connections" and "Unknown Protocol" messages that you received on occasion during downtime? The Deployment and Operations groups were actually testing the deployment of the Empyrean Age code base onto Tranquility during the downtimes before the update. This also highlighted some issues of scale and timing for the upgrades with cluster startup which we could fix before the official deployment. This allowed us to better calculate the needed downtime and allow proper time for any errors or issues that could arise. Most things went to schedule and thus we opened the server early, which was rather nice to do! CCP Mephysto and his team in QA Deployment took the lead on this and, with the help of Operations, got the code up and running on the cluster. After that a number of people from different departments (Software, QA and Operations) went over the resulting logs looking for errors and fixing them.
Starbases. Previously Starbases have always seemed to cause issues after every patch. This was noted and we conducted special tests avoid this. We actually copied some common Starbase configurations onto a dedicated server and then performed a full Empyrean Age software upgrade on the test server to verify that Starbase configurations, modules and functions worked as intended. This gave us greater confidence that we would have no significant issues with Starbases.
Unit Testing. We have developed a new unit testing framework based on pyUnit and integrated this into our Walking in Stations (formerly Ambulation) code base, as Walking in Stations is derived from the EVE code base. This is now being used to build up a set of unit tests for the basic 'services' used in EVE in a controlled environment. When "Walking in Stations" is integrated back into the main EVE code base this will integrate the unit test framework into our main code base with a much lower risk of issues. CCP Atropos has been taking the lead in this work.
Graphic Assets Testing. How do you test graphics assets? Well, we started with the ship models and looked at the rules governing them. To start, we developed a test that ensured: texture maps and specular maps are the correct size, the modules have 5 or more damage locators, and the ship and turrets both have a shadow effect associated to them. We ran this test over all of the ship model files and started looking for ships that did not meet the rules. Additionally, we have now started to proto-type a new stripped-down testing framework into which we can add all these tools and other tool ideas that have resulted from this work. Once this work is complete it can be packaged and made into a tool for the Artists to use to check all new graphic assets before entering them into the EVE code base. CCP Sputnik, with the assistance of CCP Redundancy, has worked very hard on this.
Load Client. In my last blog I spoke about wanting to develop a light weight, scriptable client that would enable us to run load and controlled testing on the test servers. After Fanfest last year we completed a proof of concept of this project and got a number of clients running on remote computers (up to 3 clients per computer) all following the commands from a central computer doing some simple tasks. This was a nice validation that we could get this working. We identified that, to make this truly successful, we needed to reduce the 'weight' of the client and have been working on this since then. We have created stubbed version of our audio and graphics engines so that we can now replace these in the client and significantly reduce the amount of CPU and memory used by the client, allowing us to run more clients on a single machine. Now that this has been done we are shifting our focus back to the control framework and the network control systems. There is still much more to do here but, when complete, it will allow us to run very controlled, repeatable tests so that we can exactly compare results on one code base versus another and look and the differences.
API Testing. CCP Elerhino, over in EVE Software, has taken on the mantle of API developer, fixing a number of bugs and standardizing the code and caching systems that the API uses. These needed to be tested so QA Engineering scripted it and made it repeatable and automatic. We now have a basic unit test framework for the API. In addition, we got the EVEMon code base and added some testing options so we could point it to our internal API test server to test the API changes. We then sent the EVEMon changes back to the developers and this is been integrated into the code base (thanks again to Araan Sunn from the EVEMon team). If any other developer of tools that uses the API wants to do something similar then please look at the changes in EVEMon as an example. As a result of this we are also looking into the deployment of an API test server for the developers to use. At this point it is still very much just an idea, but the concept is to point it at SiSi and have reduced caching timers to make development and testing faster.
There isn’t a conclusion of any sort – this is not a critical issue, but one wouldn’t hurt! In conclusion, we are building up a suite of tools that help us test the individual components of EVE in an automated fashion. We have started to Automate some common tasks so that they can be run on dedicated machines and free up the time of our testers to focus on other testing work. Next we will be looking into the Load Client (as mentioned above) and starting to string together the individual tests to trigger them automatically upon completion of a build. All in all, some exciting developments that will help us continue to make EVE a better product.