ME SQUASH BUG: Improving the EVE experience | EVE Online

ME SQUASH BUG: Improving the EVE experience

2011-10-17 - CCP Redundancy

There were many things that became apparent to us while we were going through the "experience" of releasing the first Captain's Quarters that we really needed to learn from: one of them was that we have not kept up with the advancing state of some key technologies that allow companies to deliver better products.

We released a very large amount of new C++ code with the expansion, and we were all expecting that there were going to be a number of issues that we’d missed. As I hid underneath my desk, occasionally checking on the forums and the bug reporting website for crash dumps for the inevitable issues, I realized that we weren’t getting them because, quite simply, the process of asking you to turn on an option to gather them, find them in an obscure directory and send them to us is pretty unreasonable.

Firefox doesn’t do that, Chrome doesn’t do that and Steam doesn’t do that.

What those products have in common though, is that they use an automatic uploading system for crashes, and they have a back-end that produces all sorts of pretty graphs telling the engineers that if they fix silly bug and patch it out in the next two days, number of people a day will no longer be unceremoniously dumped to their desktop in the middle of doing something important or fun.

So over the summer holidays, we just started working on a system that does just that. We deployed the first versions of this rather tentatively to CCP internally, after summer ended.  We’ve been progressively expanding the coverage and infrastructure for this ever since. We reached a point where we’ve actually improved the stability of our tools and fixed a large number of crashes that we were getting internally, that even developers never bothered to report, but there are still up to an estimated 1.5k crashes per day happening on Tranquility.

Therefore, we’ve bumped deployment of this new reporting system up in the schedule, and are releasing it on Tuesday… and then we’re going to HTFU and try and fix as many of the issues that are reported as soon as possible.

It’s what we do…

With this new system we have already identified the top two crashes that have been happening on Singularity, and we’ve fixed them in this patch too. We’re going through the others in order of how frequently they are causing crashes for the players on SiSi.

What about my privacy?

The mini-dump files that we upload only contain information about the EVE Online process relevant to the crash – we generate them using the same system that Windows does, and mainly they contain a few of the values of variables that are referenced by stack of functions that eventually caused the crash (although often not). Personal information is not collected in these reports, and it doesn’t look at anything else you have installed or are running on your computer (unless you have external binaries that have injected themselves into the running EVE process).

We’re trying to be transparent and up-front that we’re starting to do this, but you’re probably already using programs that do this without realizing it, like for instance Steam, Chrome, and Firefox.

So I won’t crash anymore?

We can’t promise this, because there are frequently things that either we don’t see internally or in testing or that come from code that we don’t write (things like programs that overlay voice chat UI that try and sneak into our code or your graphics driver).

What I think we can promise is that we’ve significantly improved our ability to identify crashes, and make sure that we catch more of them during testing, both internally and on Singularity.

Not enough pictures.

This shows the number of crashes per day from our main code line at CCP, generally being worked on by most of the developers (400 people). We occasionally get a new crash that hits a load of people (or an automated system), but we also tend to get them cleared up much more quickly now:

This was the history of the crash that caused one of those spikes.

We can see when the first recorded occurrence of a particular crash is, which makes it much easier to try and work back and figure out what change caused it.

In Summary

We don’t like releasing bad code to you or leaving it broken for a long time, and we hope this is a big step forward in fixing that. I hope that in the next few weeks we can put a measurable dent in the number of crashes that happen in EVE.

New to EVE? Start your 14-day free trial today.
Returning pilot? Visit Account Management for the latest offers and promotions.