Tama Dev Event - The Tech Side
Greetings Tranquility Residents,
A few weeks ago we ran a Live Event in Tama where us devs got out of our Polaris frigates and picked up arms against all comers. It was terrific fun for us, and from the feedback received, it looks to have been a hit with you folks as well. Today I want to talk about some of the more technical aspects of that event and what it shows about our progress against the “Lag” problem, as well as pointing at the road to the future.
So first up, let’s look at how well the server performed under the load of us and several hundred of our
ammunition exchange buddies valued customers:
Now this is the kind of CPU graph I like to see for a large fight. The hardware got a good workout, but we spent very little time overloaded – pretty much just when the last big fleet jumped in for about 10 seconds. To have 800-some folks duking it out with the server keeping up perfectly fine makes me a very happy ‘frogrammer; it wasn’t very long ago that this would have been unthinkable. Doubly so when you consider that this fight took place in low security space, where the server is extra busy tracking all the aggressions going on in addition to all of the actual fighting.
Two major things converged to bring you this result, a fair bit of effort to optimize the server code and some crazy good hardware we picked up. Tama was running on just such a piece of hardware - our super-special reinforcement node. It has been in use for a few months as the server of choice for fights in which we receive warning via the Fleet Fight Notification Tool. CCP Red Button is here to fill you in with the details on that beast of a rig:
As CCP Veritas explained above we’ve been focusing a lot on fleet fight performance over the last 12-18 month period. Those of you that attended Fanfest this year probably saw CCP Yokai and CCP Atlas spill the beans on some of the measures we have been taking towards that end. On the hardware side we have been somewhat hampered by the fact that the EVE server is largely a single threaded application and thus limited performance-wise by the clockrate of the CPU. In the past we´ve been lucky in that we have been able to rely on clockrates increasing year over year. Sadly that party is mostly over for now and the emphasis is on horizontal scaling with multiple cores. Bad news for fleet fights and ol’ single threaded EVE.
Fortunately for us though there have been improvements to motherboard design, memory latency and the CPU architecture lately that have squeezed an extra bit of ooomph out of the EVE server. Dramatically so in the last couple of years in fact. We have for example seen up to 40 percent performance gains under specific EVE server workloads in side by side comparisons of the latest generation of hardware (Westmere-EP@3.2GHz) versus the older generation (Wolfdale-DP@3.3Ghz) that most of TQ uses today. Btw, TQ’s proxy layer has recently undergone an overhaul and is now running solely on the latest generation dual proc, 4 core 3.6GHz Westmere-EP blades. When coupled with the server side software optimizations it’s a tremendous change that paves the way for a significant increase in scale. The SOL layer is also due for an upgrade soon and.... now I’m off on a tangent as this is material for another dev blog :) We’re here to talk about fleet fights !
In order to help acommodate ever larger fleet fights we’ve, for the reasons stated above focused squarely on the clockrate and for that purpose had set off on the mission to build the crazy liquid cooled hybrid Frankenstein of a server expected to run at between 4.6 and 4.8 GHz which CCP Yokai demoed onstage at the last Fanfest.
Could it be cooled by beer? Or Quafe?? Could it?!
Sadly (or fortunately perhaps) that particular piece of technological monstrosity never saw any real live action outside of our lab because once it was about fully built we were presented with a proper purpose built server from one of our primary vendors that was much more datacenter friendly read: didn’t leak as much fluids , able to achieve similar clockrates as well as perform reliably using conventional air cooling (although I’d actually hardly call it conventional). I’ll say though that a benefit gained from creating the liquid cooled monster is that the team derived great pleasure from building it and I’d often arrive at work in the early hours of the morning and find one of them lovingly stroking it in the back room.
To make a long story short we emptied our piggy bank and got one of the air-cooled technological marvels for our datacenter. It came with some pretty stringent requirements for air cooling (inlet temperatures of 10 degrees celcius etc.) so we had to jump through a couple of hoops to meet those by rearranging airflow and improving cooling in the datacenter but at long last it’s up and running and has been serving fleet fights for the past few months with great results.
Sadly I’m not allowed to go into specifics on make, model or performance characteristics beyond saying that it runs @4.4GHz, is Xeon based and is pretty freaking awesome. This server is not officially available and neither is the processor except to OEM’s for specific extreme applications. It’s got performance characteristics to die for... which incidentially is what we use it for... so people can die on it... a lot...
So combining the added efficiency of the latest hardware architecture improvements and the impressive tickrate of the supernode we have observed up to 80% performance gains over conventional TQ hardware which hopefully directly translates to your playing pleasure.
So having said that I’m passing the ball back to CCP Veritas :
The server isn’t the whole story of course; there’s more to Eve than just the servers running in cozy racks with comfy air conditioning and ample supplies of current. We’d have nothing without that little bit of software everyone runs on their own well-cared-for machine – the Eve client. And as probably everyone at that fight can tell you, the client could use a helping hand. From my point of view as a Logistics pilot the game was quite playable, but once we had the full 800-some folks fighting together, it was not a smooth experience. Functional, but not fluid. We’ve analyzed some of the major causes of this performance decrease on the client and have begun picking off the easy targets in the Crucible expansion.
Those who followed Team Gridlock through its lifetime will remember that we were focused exclusively on the server. The above result is pleasing for us – we’ve gotten to the point where for goodly large piles of players, the server is no longer the first bottleneck for smooth gameplay. There are still some server-side projects left on the table with good value which need to be properly prioritized, but we can legitimately say today that putting more focus on making the client more performant would be a good idea. Expect to see some more pushes towards a smooth and responsive client in the next year.
Just to show you how much difference a year of lag fighting can make for the server side, I went back into the archives and found a couple similarly-sized fleet fights from early November, 2010:
**R-6KYM on 5 November, 2010:**
Now this is the kind of CPU graph I don’t like to see. For a solid hour and a half, the poor thing is overloaded and there’s just no hope in sight until the fight is done. According to player reports from that time, logistics had great trouble getting information about fleet mates in need of armor and locking targets causing the repair chain to be very weak.
That’s exactly the kind of game mechanics breaking performance degradation that we set out to fix. Of course, in the Tama event, this was not an issue and our repair chain was strong, like bull. Until, you know, we died in a fire.
Also you’ll notice the graph terminates prematurely. This is because there were lingering problems bad enough to cause us to take the node out back and do it up Ol’ Yeller style. He didn’t even see it coming.
**CYB-BZ on 7 November, 2010**
Oh man, this is just not cool. Four solid hours of the poor hamsters going full speed until they died from a heart attack. Someone should really call an animal abuse hotline. There’s just no way this was a clean experience, and indeed, the player reports from the time confirmed by stating that the DRF fleet was seperated and unable to rewarp to optimal due to the lag monster holding it in place.
So, while no two fleet fights are ever the same, I do hope this comparison demonstrates what magnitude of benefit our investments in bleeding edge processing hardware and software optimizations have brought to Eve. Having a single-sharded large scale universe is something we’re extremely proud of, and massive conflicts are a natural consequence of them. I hope you’ve all enjoyed the past year of performance improvements as much as we’ve enjoyed delivering them.