Keeping EVE running smooth and fun to play - How you can help!
Wow, this has turned into quite a long blog, but it's chock full of much info-goodness which hopefully addresses many of the questions and concerns people have had about public testing and how we approach large-scale performance and feature testing for EVE.
What's in our anti-lag arsenal?
As EVE grows, more features are added, more players are online at once, and the whole system becomes more complex we find that our ability to weed out sources of lag becomes increasingly difficult. Throughout EVE's development we've used various methods to identify and fix sources of lag; server performance monitoring, special task-forces to ‘seek and destroy' lag, monitoring forums and petitions, public test servers, and intensive regression tests of all new code, for example. Though these are great tools, we don't feel that it's enough. Over the years, while creating all of our expansions and the new features and changes that come with them, it became apparent that we had a problem: We had no good way to simulate high-load situations such as fleet fights, factional warfare, Jita, etc. This is a particularly difficult situation for us. In any high-load situation there are always a very wide range of actions being performed simultaneously, with many different hardware configurations on the client machines, and many small factors quickly add up. But these factors are all relatively unpredictable and almost random in practice.
Consider Jita. Think of how many people are there; now think of all the different market transactions, combat, chatting, industry, and other activities that are going on simultaneously in that system - you can see that testing this scenario is not as simple as "load up 1000 clients and let them run." We needed a way to get as many different transactions and machine configurations as possible in order to confidently say that we've gotten a valid test for high-load performance.
Over a year ago, in order to address this issue, we began exploring several different options. We looked at everything, from specially built load-testing clients (bots), to hiring many more testers, to community-supported testing, to developing specialized test suites and scripts, and everything in between. After some prototyping, tests, and wracking our brains, we finally came up with a multi-prong approach that includes community-supported testing (i.e. Mass-Testing - more on this shortly), massive test-server upgrades and changes, load-test clients (still in the early stages of development), special debugging code, and more data logging than you can shake a stick at.
Mass-Testing
The cornerstone of our approach is an initiative, dubbed "Mass-Testing." Simply put, we figured that the best way to "simulate hundreds of players all doing many different actions at the same time" was to, well, get a bunch of real players together and run through proper test scenarios. This is such a vital thing because it is the only way we can accurately gauge how the new code behaves under such extreme conditions as fleet fights with several hundred pilots. This is because of various factors which only become apparent in such highly loaded situation, such as diminishing returns, and code scalability.
At its core, Mass-Testing is a very basic framework which allows us to invite hundreds, or even thousands, of players onto the test server to hammer away at various things. This framework provides us with an excellent venue to present new features or changes to people and not only see how well it performs, but also to get critical feedback about the new changes directly from the players who've taken the time to try them out on the test servers. By running these events regularly, we also gain the ability to track empirically the performance of the client, server, and network layer over time. This all comes together to give us a much clearer picture of where we stand now, where we came from, and where we should focus our efforts moving into the future.
Can't we do this another way, like testing live on TQ?
Unfortunately, testing is never as straight-forward as it seems on the surface. Even in fairly simple programs, the number of possible combinations of inputs, raw data, transactions, and hardware can add up quickly to the point where ‘testing everything' would literally take several human lifetimes. Because EVE is constantly being updated and changed, this adds another level of complexity to the issue overall. How we choose to address this for EVE is by prioritizing areas that need more attention; based on previous history, the complexity of the system or the changes made to it, risk to the cluster/game, player feedback, and several other factors.
When dealing with testing for EVE, we must always keep one thought in the back of our heads: "how will this affect the game, the cluster, and the players?" Sure, we could test directly on TQ, add all sorts of debugging code, join in every fleet fight, etc, but at the end of the day, TQ is for players, not for testing. Adding debugging code would kill the server's performance and make laggy battles much worse. Sending testers into fleet fights could very well lead to cries of favoritism or ‘DEVH4x'. Neither of those results would be desired. We in QA feel, very strongly, that anything that would negatively impact the performance of the live servers or people's enjoyment of EVE should be avoided like the plague.
What good will all of this really do for EVE?
We have been running a pilot of this program for roughly the last year, and it's certainly done its job in helping us identify certain performance issues on the server, client, database backend, and also in our networking layers. This program also provides us with a direct line of communication between CCP and the EVE player base to find how well the expansions perform for you, and also to get some critical feedback about new features and changes in those expansions. This feedback has helped developers tweak and tune their changes in an attempt to make sure that we're providing you with a quality user experience. Some examples of changes that have come about as a result of your feedback are:
- Overview tweaks, such as the ability to toggle all brackets on/off easily
- Improvements to fleet-finder
- Fixes to improve lag in factional warfare
- Cleaning up POS code
- Fixes for EVE Voice
In addition to the feedback we've been able to gather, we've also isolated and killed several problems well before they ever made it to TQ. Some examples of these resolved are:
- Server-side memory leaks
- Runaway DB procs
- Client FPS instabilities
- Client memory leaks
- Services being called too often (kills server and client performance)
I am the great and powerful OZ!
Just as it was with Dorothy and the great Oz; an unfortunate fact about Mass-Testing is that most of what's being done is all behind the scenes, which can lead to many folks getting the wrong impression about what's actually going on. But unlike with the great Oz, there is a lot of actual work going on behind the scenes here. We have staff from our Software, Quality Assurance, Operations, and Game Design departments, plus our awesome Bug Hunters, all working in concert to collect data and feedback, analyze the information, and fix what's wrong. I'll give you a brief summary of what we look at during these tests:
DB traces
We collect full traces from our databases. This information tells us what transactions are being made, what procedures are being called and how often, and how long they take to execute. In addition, we also track overall database's load, health, and performance.
Server-side metrics
We collect detailed metrics on the servers' performance, especially detailed statistics on CPU and memory usage for discrete layers of the server code. In addition, we collect data from every piece of the game code during execution to see how well they're playing together, where bottlenecks occur, and if there are any runaway tasklets in our code.
Client-side metrics
We also collect metrics on several developer clients during these tests that collect data on the client's CPU and memory usage as well as things like FPS, response time, desync, etc. This data allows us to pinpoint pieces of the client code that need to be fixed or streamlined in order to improve the client's performance.
Network-layer metrics
In a game like EVE, which use a single server with many thousands of simultaneous users, network traffic, bandwidth, routing, and load-balancing are very important. This is why we also gather full network traces for connected clients and collect data on how well the load balancers are coping with the new expansions.
All of this data is collected and kept from one test to the next so that we can compare, analyze and find performance trends, which in turn helps us identify those areas we need to spend more time cleaning up in order to ensure that EVE runs as smoothly as possible.
Growing up a bit
Over the past year, this program has really just been in its infancy, and we're proud of our baby, but it's time for it to become a toddler and start walking on its own. We need this program to get bigger, better, and provide more value to EVE as a whole. In that spirit, we are moving into the next phase of the Mass-Testing program, where we will be making improvements across the board, but especially in scheduling, backend support, and, later, reporting and information.
Calling all alliances!
In the past, we've had problems getting enough players to show up for these tests, and we will be addressing this issue directly. Ideally, these tests should have at least 400 participants, up to 1500 pilots per test; historically, we've managed to get around 100-200 pilots per test. In an effort to improve numbers, we will do three things. First, we will try to hold these tests on the weekends to avoid work/school/real life conflicts. Second, we have setup a moderated, public mailing list, "Mass Testing Info", which anyone can join, which we will use to send out dates and information about testing. Finally, we will also invite all alliance leaders to bring their members onto the test servers to participate in these tests. The idea being that, by working with alliance leaders, we can hopefully achieve the numbers we're looking for in these tests.
How will this work?
Before each test, we will put up a sign-up thread on the forums with details and an open invitation to any alliance with more than 150 members, as well as sending an in-game EVE-mail to the "Mass Testing Info" mailing list and also direct mails to the leaders of all alliances with 500 members or more. Alliance leaders can then sign-up and indicate roughly how many pilots they are confident that they can bring; we will take the first 1800 pilots, on a first-post, first-serve basis. If we don't get enough pilots signing up, we will cancel that test run.
- Important caveats
- Players not in alliances can still join in
- You just need to show up at the correct time and follow directions
- We will accept all alliances who sign up, until we reach a maximum of 1800 pilots
- Do not sign up if you're not confident you can bring people!
First of a new breed
You folks have been saying that you want test events that are held at more opportune times for you, and we've listened! These tests do indeed require as many players as we can get, but holding events on weekends or too late into the evenings, on a regular basis, is not as easy to coordinate as one might think. Even game developers and testers have real lives, families, and friends that they enjoy spending time with too. But, we aren't afraid of change, so we've worked out staffing and, from now and into the future, we will be holding test events on weekends, close to peak hours. There will be exceptions to this rule, if critical issues come up that require special testing, but we hope that this new schedule makes participating in these events much easier for you all.
The first event in this new series will be held on Saturday, February 20 at 20:00 GMT
This first test will be a return to our standard "fleet fight and POS siege" format, with some tweaks and slight changes, look for an official test announcement thread in the General Discussion forums in the coming days. We will post full details and goals for the test there.
Information and transparency
One thing that we feel has been a shortcoming in our previous mass-testing exercises thus far is the lack of reports and information being relayed to you, the players of EVE. To remedy this, we will begin publishing the results of each test. Over time, we will add to what is being reported and, hopefully, even get a sub-site on eveonline.com to host the results from all these tests so that anyone can view results from recent tests, as well as historical results.
And I'm spent...
Testing for EVE is an ever-evolving process, one which we strive to constantly improve. One important measure of quality, at least for us, is determining how good of a gaming experience we have delivered to the players of EVE. Though this can be difficult to quantify, or even be a bitter pill to swallow, we still find your input and feedback to be vital to making EVE better.
To that end, I'd like to invite you all to give us your input about the overall quality and performance of EVE. Tell us what areas of EVE you think could do the most good from performance tweaks, what new things you think we could add to give you better control over your user experience, what changes or additions we have made that helped you, etc.
Mass-testing is an ongoing process and is intimately tied into the community of EVE; let's use this as another way to improve EVE together.