All you wanted to know about Defects but were afraid to ask
I was recently thinking about how defects could be categorized since there are multiple different problems all classed under the same heading of 'defects'. I concluded there were 3 broad categories that could be defined (but even that isn’t wholly accurate).
1. Code errors
Coding errors come in 2 flavors. The horribly obvious ones such as "OMG why doesn’t the server start" and the horribly convoluted ones best described as "WTF? He managed to do what?! HTH did he manage that?".
a) The Horribly Obvious:
The obvious ones are, by nature, very easy to find. We do have some problems that only come to light when we try and start the server on the Tranquility cluster, and this is one of our biggest headaches on patch days. The end result of this is the occasional extension to the DT when we are deploying expansions which you guys see, and a lot of stress and headless chicken reactions within the office which you don’t see.
We try and minimize the occurrences of these problems with code reviews when the programmers check in new changes, and through experience. Knowing how it broke last time makes it easier to avoid it the next time. However, there isn’t much more that can be done here.
b) The Horribly Convoluted:
This type of error is the biggest headache we can have. The easiest definition of this type of problem is duping bugs and other exploits.
These occur when a piece of code doesn’t work as intended or 2 separate pieces of code 'interfere' with each other producing an unexpected result. Code reviews catch some of these, more are picked up when we are testing. Some manage to leak out however, and hunting them down once we hear about their existence involves a lot of (guess)work. The QA job for this involves finding what it is that’s causing the problem, finding a way to reproduce the problem that works most/every time, and then hunting down those people that have been abusing the exploit and handing their names to the GM's for 'attitude adjustment' .
ii) Performance issues:
Performance issues are, obviously, those things that affect the performance of Eve on the live servers. This can be due to the client, the server, both and even neither.
Client performance issues normally affect frame rates. For example, leading up to the Revelations release, vsync was accidentally switched on, resulting in all framerates being cut. We were aware of this occurring almost immediately, however, finding out where the vsync had been changed took several days to determine simply due to the amount of code that had to be checked through. Server performance issues do not directly impact the player experience (normally), and are of more concern to those of us that care about the hardware performance. This area was most noticeable as the jump queues which were occurring for a while after Revelations was released. Jump queues were added to prevent the nodes being put under excess load, and the code released was lacking optimization in some areas. We work to resolve these types of issues as quickly as we can. Generally performance issues are only seen when the code is released to TQ, as the metrics we can monitor on Sisi are less than ideal due to the inconsistent load and logins there.
Following on from the performance issues is the collection of detailed metrics from the Tranquility cluster. This allows us to identify potential problems based on players’ choices, and try and manipulate things to reduce or eliminate these problems. A good example of this is Jita. Technically there are no code errors or other performance issues affecting the system, the lag is all down to the sheer number of players using the system. While Jita is an extreme we are still working on, there are other systems that have had similar problems. By collecting data on where the server load is, and analyzing the cause, we can make adjustments to reduce the load (relocating agents for example).
2. Game Design Problems
Game Design or Content errors are a mixed bag of problems for QA to investigate. This area covers everything from individual agent mission designs to deadspace complexes to exploration to new ship designs and balancing. QA hates this type of testing most as it consists of repetitive testing. It’s fine the first few times, but after you've done the same mission/complex/whatever test 20 times, you start wanting something different. This is one of the main reasons most of QA are PvPers on TQ. We've done all the other content so many times in testing we don’t want to see it again. (For added fun, ask Tanis about Exploration sometime .
a) Game Mechanics
Game mechanic testing generally involves a fair bit of repetitive testing, a little guess work and a lot of demanding explanations from the Content/Programming team regarding how they intended the feature to work. QA advise and provide feedback on new and existing features, but on a certain level both we and Content are stuck with guessing how something is going to pan out. The looting system is one example of this. There are currently 21602 combinations of entity (NPC) and loot group (a loot group would be 'small projectile turrets'). Clearly checking every single combination manually is out of the question, so we write scripts to automate our testing. Unfortunately scripts can be fooled, so there are a few problems out there, such as NPC battleships dropping 1MN Afterburners. Other mechanics are simply not directly testable, but can only be gauged. The new exploration system introduced in Revelations has a percentage chance of returning a result. The best we can do is run the test X times and see if we get around the correct percentage. The same problem occurs for system scanning and many other areas where the result is based on chance in some respect.
b) Game play
Game play is a very hard thing to test, as much of it relies on the testers perception of what is fun/easy to use. This is where knowledge of the game and its players is most important, as we have to gauge how a new feature will be received by the community. One aspect of this is reading the feedback from players using the test servers. We cannot rely solely on forum feedback though, since many people don’t post and more to the point those that do generally say what's wrong, not what is right. Still, game play is an important part of testing, even if most of the results are not quantifiable.
As an example of this, let’s take Fighters being abandoned at some strange location since they followed the carrier through an auto warp. Arguably, there is no defect here, since fighters are just drones on steroids and thus act the same way. However, it is hugely inconvenient and not good game play, so we classify it as a defect.
3. UI Errors
Technically these are the same as code errors, however I choose to group them in a separate category. These cover such things as the online/offline icons not working, weapon icons not flashing and such like items. These are all obvious problems that QA should be able to pick up, and most of them we do. Some, however, are picked up late enough in the patch development cycle that they are deemed 'not important' enough to delay the patch e.g. the character sheet icon still flashing as 'no skill training' that we had for a while. Others we do miss. After a while, we develop a certain 'blindness' to the UI. I can, for example, navigate the entire S&I interface without thinking about it, even if my client is in Chinese language mode (which confused our Chinese speaking QA, who thought I could read Chinese for a while after seeing me do this). As such, we can often use parts of the UI without actually 'seeing' them. How many of you actually check that the icons are flashing green when you fire your guns? Most likely, you know you pushed F1-F6 (or whatever) and therefore your guns are firing.
What this means is that UI errors are often considered to be low priority when it comes to fixing defects (a situation I have been working to change for some time now). In theory, all defects should be considered equal but without infinite manpower, priorities have to be set, an unfortunate fact of QA life.
Usability 'errors' are not errors in the strictest sense, like the situation with Fighters I mentioned above. One of the reasons the integrated voice chat was delayed from inclusion in the initial Revelations release was due to usability. Also like the game play situation, usability can be something very hard to gauge. We use multiple aspects of the game on a daily basis and as such, we 'get used' to things being the way they are. The worst case of this is when we are testing a new feature for release. Since QA can spend weeks or more using a crappy prototype interface, getting the new improved snazzy one can lead to overlooking how this new interface could also be improved. One area I would love to have cleaned up (even though I never use it outside of testing) is the access standing settings in POS's. It’s just so aggravating to use (to me at least).
How good looking is Eve? Yes, we all love the screenshots, but if you look closely at some things, they really don’t stand up to scrutiny (Yes, I know I just committed blasphemy by implying Eve is anything other than sexy looking). Among the myriad of other things to keep an eye out for, we in QA have to keep an eye on the graphical displays themselves: Does that explosion look right? Did the jump gate animation play correctly? Why does the 4th turret on the Absolution display in 2d? (fixed now I believe). Inherent in this is another area of testing, which is whether a special effect (such as an armor repairer's green glow) is causing any lag. Some graphics simply arrive unoptomized and can have a drastic effect on the frame rate of the client. For example, we have 2 different versions of the 'spatial rift' LCO. One is fine, and causes no perceptible change in FPS, even with 3 or 4 on screen. The other causes a drop of 4-5 frames with just one on screen. We are currently building a series of test environments to allow for the checking of various graphics and models, but it’s all manual so far and takes a lot of time.
Localization is the translation of the Eve client into various languages. We started with Chinese for the launch of Serenity last year, and we've expanded to include German last summer. In the near future we are looking to include several more languages. For this area of expansion, more than any other, we will be relying on 3rd party companies to do the translation work for us. The problem for QA is trying to determine how accurate those translations are. For a while the 'Stasis Webifier I' in the German client was labelled as a 'Domination Stasis Webifier' due to an error in the way translations are matched to items. Still, we have a lot of multilingual people in the office, and we get the translations reviewed before release so we can catch most of them.
To cover all of these areas, QA has what we call our "Sanity Tests". We use these as a guide to testing as much of the game as possible in the shortest period of time possible (normally 1-2 days) and we run these the week before the release of a patch. This is why we occasionally delay patches (we find errors). We also have a set of checklists which check components of Eve in massive detail. These lists take as much as a week or more to run through fully. We are getting a lot of help from the Bug Hunter team when going through these lists, for which we are most grateful.
Ok, enough from me. I need to go watch the snow falling outside.