My node was equipped with the following...
I thought I would add to the recent dev blogs we have had over the last couple of weeks and talk about what has been going on with the Tranquility cluster itself - in relation to the StacklessIO and 64-bit EVE enhancements and where we are heading into the future.
1 x EVE Server Cluster
The EVE Cluster is broken into 3 distinct layers, and a bit of the terminology that is thrown around from time to time (including later in this blog) can be explained quite simply here.
- Proxy Blades- These are the public facing segment of the EVE Cluster - they are responsible for taking player connections and establishing player communication within the rest of the cluster.
- SOL Blades - These are the workhorses of Tranquility and are the primary focus of our ongoing work. The cluster is divided across 90 - 100 SOL blades which run 2 nodes each.
- Node - a single EVE server process. This is the lowest level of granularity within the cluster.
- Dedicated SOL blade - These are SOL blades that we dedicate to one system only. Systems such as Jita, Motsu and Saila reside on these. They run two nodes like any other SOL blade, however the second node is idle and does not load any solar systems.
- Database Cluster - This is the persistence layer of EVE Online. The running nodes interact heavily with the Database, and of course pretty much everything to do with the game lives here. Thanks to our RamSans, our database is able to keep up with the enormous I/O load that Tranquility generates.
- At peak hours, our database is processing over 2,000 transactions per second, which generates around 38,000 IOPS (input output operations per second)
- To keep up with this load, we currently have two RamSans.
1 x SOL Blade
The EVE Server application itself (also known as a node) is primarily a CPU intensive process. Due to the nature of the Stackless Python programming methodology chosen for EVE Online, the python component of each node is a single thread, which means it can only ever utilize 1 CPU core at a time.
Our SOLs are IBM blades, and up until quite recently were almost all running AMD Opteron 2.8Ghz Dual Core processors with 4GB of DDR1 Ram. Over the last 6 months or so, we have been investigating options for replacing these Opteron processors with something more powerful. We selected some dual socket, dual core Intel Xeon 3.0Ghz Woodcrest blades for testing purposes, and have been using them as an integral part of our StacklessIO testing (as blogged about here by CCP Explorer). Now that StacklessIO has been released we are able to use these blades to their fullest, and as a first step looked at ways we could use these test blades on Tranquility.
1 x Rapid Deployment
When we hit 1400 players in Jita and then had the unfortunate incident where the SOL blade powering Jita ran out of memory, we looked to our Intel test blades for help. We shuffled some RAM around and were able to get 5 new Intel SOL blades with 16Gb of DDR2 Ram each ready for use. We did a staggered test deployment of these to Tranquility last week. On Friday, confident of their stability and anticipating performance increases, we set them up as dedicated SOL blades. That evening, Jita, Saila and Motsu were performing better than ever, and there was much rejoicing. Over the last weekend, the GM's did not receive a single "Stuck Character" petition from Jita!
3 x Epic Fleet Fights
That Saturday, out of the blue we saw one of the nodes supporting 0.0 go to Critical status and shortly afterwards it shut down. This happened a few more times in quick succession, and it became apparent that there was a new issue where extremely loaded nodes were simply not able to keep up with their heartbeat. This issue in itself is fixable and we are working hard to get it resolved.
At this point, it was apparent that with 700+ players trying to "pew pew", the AMD node they were on was not going to do anything other than keep crashing. We re-mapped the system in question to one of our dedicated Intel blades, just to see what it was capable of. Jita had performed so well the night before, that we thought these nodes would handle a fleet fight quite nicely. The system held, and the rest, as they say, is history.
On Sunday night, the M-OEE8 System was the hotspot and it had been placed on an Intel 64 bit dedicated SOL blade in anticipation. It held fine with a peak of around 450 players.
On Monday night, over 1000 players tried to start a fight in this system. As with Sunday, we had anticipated there would be fighting there, so it had been placed on a dedicated node. Unfortunately, what had caused node crashes at 700 players on our AMD blades caused our Intel blade to miss its heartbeat after going a bit over 1200 players. Interestingly enough, despite missing its heart beat, many players have reported that the performance of this blade with 1000 players was very good in the 10 - 15 minutes prior to its shutdown.
I would like to stress that we at CCP are very excited by this, and we are very hopeful that once the issue causing these node deaths is solved that we will start to see this impressive performance much more often. A lot of people have put in a lot of hard work towards new technologies and it is starting to pay off for you, the players.
So where do we go from here?
We are by no means finished with these upgrades, and there is still a lot of work to be done. During this last two week period, we have proved the readiness of some of our new technology, and we now need to work on the best way to ensure everyone can benefit...
Newer, Faster Blades
Our 3.0Ghz Intel Woodcrest blades are nice, but that processor architecture has been replaced by Wolfdale, which is even more powerful. So we have put a fast-track order in for some Intel Xeon 3.3Ghz Wolfdale blades. We expect to have these in the cluster very soon and we anticipate these will give us an even bigger performance boost than we have seen so far, paving the way for a new Tranquility Cluster. It is worth noting that the hardware we are beginning to purchase now is the hardware that will see us all the way into the HPC era. There will be a detailed presentation about the status of this project at Fanfest in November.
Help us to help you
It's nice that we have this new hardware, but there is going to be an interim period while we work to upgrade the existing hardware to this HPC-ready specification. During this period, we will be proactively working to place fleet fight systems onto a dedicated node at downtime. We often can't predict where our players are planning to unleash hell, so we need to know which systems are going to have fleet fights! We are working on a way to allow players to directly contact the Virtual World Operations team with this information, however in the mean time corporation directors are invited to petition any planned operations (use the Stuck Character category at least 24 hours in advance, and please include estimated attack / defense numbers), and we will take note of this when we assign systems to dedicated sol blades during downtime.
With that, I will leave you with my final - subtle - personal thoughts on the matter.
This is EVE Online!
Impossible is nothing!