Introducing: The EVE Performance Group | EVE Online

Introducing: The EVE Performance Group

2008-10-16 - CCP Tanis

Greetings from The EVE Performance Group

If you've been reading recent dev blogs then you'll already know about EVE64 and StacklessIO. You may also already know about the current hardware upgrades in progress for TQ. What you may not know is that we've got a dedicated group of people whose entire purpose is all about making EVE run better, faster, and smarter. Working under the "Need for Speed" effort that's been ongoing since late 2006; we strive to improve the performance of the EVE cluster and clients as much as we can.

Where to begin?

A big part of what we do involves research and investigation. Quite often in order to find the best way to fix a problem you need to fully understand the problem and all the factors that contribute to it. This takes time, patience, and usually a lot of reading of code and specialized testing.

We started by using data from previous investigations and then building upon that with more monitoring, profiling, and debugging tools to help us identify points where we could get the greatest performance gain or where we could reduce load the most.  

Some of the results of these investigations were surprising, like the effects that putting 0.0 nodes and empire systems on separate machines has on the cluster's CPU usage trends and some ways we might be able to optimize that to decrease lag or how we can change some of our database procedures to make the database more stable and even more efficient than it already is.

One of our biggest investigations, which is still under way, is to finally determine the best changes both in game mechanics and code that we can make to relieve the congestion or load of busy systems and constellations such as Motsu and Saila. This is basically building on top of what we've already figured out and done for Jita and making it even better and then applying it to more areas.

Keep in mind that "fixing lag" is not as easy as a single change and it is important to us to do it right rather than just rush out changes that aren't sure will work yet. That being said we are very serious about taking care of these kinds of problems, hence why we formed a group to look just at these types of things for EVE.

All your (data)base are belong to us!

While our operations team is busy working on getting TQ upgraded and tuned-up; we're also working on optimizations for the EVE database systems as well. On an average day the TQ database performs about 3000 calls per second, or about 250 million transactions per day. Add to those calls all the processing time it takes to update entries, query tables, and return the results to the proper client or service and it's easy to see that the database efficiency can effect performance of both the server and the clients connected to it.

What we've done as part of the performance group is to review many of the most resource intensive or time consuming queries and procedures to determine what changes could be made to increase performance or decrease wait times. We've looked at everything from the procedures handling reprocessing logs, contracts, and corporation roles to the database structure itself.

One in particular that we've worked on is a procedure handling corporation faction standings. We found that the procedure was taking up far more time than we were comfortable with and that prompted not just a refactoring of the code, but a change in the way that corporation faction standings are calculated to be much more efficient. The system used to update standings slowly, over time, based on the standings of all a corps members plus a large number of other factors. What we've done is to simplify that greatly. The new system will calculate standings based only on the average of all members who have been a part of the corporation for 7 days or more without factoring in anything else.  Not only is the new way much easier for the server to handle, but it is also much easier to verify that it's working correctly.

Refactoring these procedures was only part of the overall database performance efforts. We have also been coming up with other ways to improve the efficiency and stability of our database. For example, ways that we can increase fault tolerance or allow the procedures and scripts to run faster or execute more at once. We've also been consulting with DB experts at Microsoft to find new and creative ways to make our database itself run smoother and better. All this was done with the sole purpose of improving your EVE experience.

IM IN UR GAME CHANGIN UR CODE

We didn't stop at the database though, we wanted more, more, more! After looking over the results from our investigations we found several areas where we could potentially get some really nice performance boosts by adding to or changing some game logic. We're not just talking about server-side stuff, our commitment to improving performance is especially important for the EVE client as well.

One such change that we've decided to implement is a new feature we're calling "Weapon Grouping". I won't go into any real detail here as it will be discussed at length in a following blog, however I will tell you that aside from improving performance quite a bit it's also really friggin cool, which is always a plus. It has a nice side-effect of giving players some very cool new options for how they use their weapons.

Along with Weapon Grouping we've also been looking at the inventory system. This is by far the most used service in EVE. Currently we are looking at altering the way that moving of items is handled. We want to make it handle things in batches rather than as single items. Basically if you move items from one place to another, trade items, etc. the system moves these objects one at a time and makes a call to the database, then back to the client for each item. It is, of course, more efficient to move those items in batches so that the number of calls between the server and client is reduced.  Considering it handles all items and objects in the game it's no shock that it's on our hit list for improvement as even a small increase in each case will amount to a huge savings in server and client resources in the end, these changes are still in the works though so no ETA on that yet.

Another thing we found was costing a lot of resources was NPCs, in particular NPCs when they are attacking something. What we've done is to change the rate of fire and damage modifier for most NPCs so that they fire less often but do more damage per hit, effectively keeping their DPS (damage per second) the same while significantly reducing the number of calls and updates required for the server and the client. I feel I should mention that not all NPCs have been altered; we are being very careful to ensure that NPCs will not have an alpha-strike that is too high; meaning that a properly fitted ship should survive without worries. It is important to us to maintain proper balance while we're making these performance tweaks.

Behind the scenes

Of course, we also turned our attention towards the low-level and back-end systems as well. Sure, we could probably have done it without having to teach the server hamsters new drinking games, but where would the fun have been in that? Seriously though; we wanted to tackle the really nasty and huge pieces of code that affect the most systems so that we can be sure that any optimization we make would show a positive effect for as many players as possible.

It should also be noted that with each of these changes we are taking extra time to do more research and investigate more options because each of these systems is so low-level and far-reaching ensuring we get the best possible results from our changes becomes even more important. As a result we have broken things down a bit more for these parts of the game code and will be doing our optimizations in smaller chunks with more testing to ensure we get the result we're looking for.

First up on this list was the aggression manager. This lovely system handles any aggressive act in game; can theft, pirating, shooting NPCs, etc. Thus far we've ended up doing a bit of trimming and refactoring which have shown an increase in performance across the board. More importantly we still have many more changes we will be making in the future to increase performance even more. 

Next we went on to look at the system that handles all the attributes and the effects of those attributes (like skills, item attributes, etc.). This system is even more far-reaching than the aggression manager because it comes into play with every ship, character, and usable item in game. Unfortunately because this is such a low-level system it is very tricky to make any big changes to it so we're starting out by doing much more research into it.

The first initiative to come out of our research thus far is a change to how your skills are applied to ships. Essentially the current system reloads your skills too often and takes too long to do it so we've re-factored the code for that to make it more streamlined. The change resulted in this system performing 30% faster than on TQ today. It is important to note here that changes like this tend to scale up with the number of people involved (i.e. Fleet fights) meaning you will usually see more of a performance boost in situations with large numbers of people than you would just by yourself. "Weapon grouping" is another big change to come out of the research we've done thus far and we're still doing more investigating as we go.

Because StacklessIO is much more efficient than its predecessor, it allows more information to flow around the cluster. This had a side-effect of increasing the CPU and memory usage of EVE. We then set out to determine if this increase was just because the cluster was finally getting more data to crunch at once or if it was using resources poorly. What we found was a little bit of both. Because StacklessIO is so much more efficient it does indeed allow more information to get to the processors at once, which is actually very good. What we also found, though, were a bunch of nasty bits of code that were not managing system resources well. Those fixes have already been deployed to TQ for the most part, but more are coming.

Final words

I really want to express how excited I am by the progress we've made, even just in the past couple of months; and there are even more promising changes on the horizon.

EVE is a beastly and challenging mistress to work with. You can, however, rest assured that we are totally committed to improving your overall in-game experience and we will never stop our fight against "lag".