July 18th 2014
Last week, we posted about our process for solving the lag and rubber banding issues currently impacting Arena Commander’s multiplayer mode. As you know, we increased the player base by eight fold when we released patch 12.4. Doing so exposed variety of emergent issues within the game client, game servers, and backend infrastructure that only became apparent under the increased player load. We’ve made a lot of progress in addressing these issues and have seen great improvement on all three fronts, but the patch is not ready for release today. This kind of testing is exactly the reason why we chose to pursue the open development model. The discovery of these kinds of issues this early in the process are going to greatly improve the player experience in the long run and increase our efficiency by getting it right from the beginning.
Patch 12.5 is responsible for adding the jukebox, pictured here, to the Hangars’ of current subscribers so they can liven up their hangars with their favorite music. Your jukeboxes have been attributed to your accounts, but will not appear in your Hangars until the patch goes live. Additionally, on this week’s Around the Verse, we talked about a mysterious new Vanduul threat destroying UEE ships. This threat will be added with 12.5, at which point Citizens are encouraged to go hunting for it!
While there will not be a patch today, we would like to share what we’ve been working on so far and how we’ve been approaching the problem. The QA Team has been hard at work investigating the lag that is being experienced on the public servers. We know that this has been a huge frustration, and QA has thrown its full weight against isolating and reproducing this problem so that engineers can solve these issues.
In Arena Commander multiplayer your client is receiving updates from remote clients via the server and, in the case of movement, your local client’s IFCS is actually simulating the physics of each remote client that you see based on these updates. Your client is then reporting back to the server all the positional and orientation data for each of the remote clients that you are simulating. The server is authoritatively checking this against its own calculations and those being reported by the remote clients themselves. If there is divergence in the reported numbers the server will provide your client with how far its remote client simulations are off and inject that into the IFCS physics calculations to nudge the remote clients you see back to their proper positions as they fly. If the divergence becomes too large for IFCS to gently nudge the remote clients then they are warped to their proper position and the simulation commences again. As we have been investigating this issue it has all revolved around discovering what is causing the simulations of remote clients to sometimes for some people become so divergent from the server and the remote clients themselves that the server is forcing a warp.
We first investigated potential issues with the game servers by performing some optimizations to load balancing and by reducing the number of servers on a physical machine from 8 to 2, which dramatically decreased network traffic from each machine and reduced CPU pegging but did not reduce the lag players were seeing. It did however result in increased stability and minimizes server CPU frame spikes which will generally improve performance for players across the board.
We then approached the situation from the client side, creating a controlled environment in which no one shot any weapons or used any boost, and then began increasing the number of players over time. Noticing that this yielded no lag, we then started to introduce more variables into the match.
Shooting weapons without hitting any players caused no issue, but once ships began to fire on each other, lag began to crop up with players skipping around. We were able to reproduce that lag, but we needed a more specific cause, so we attempted to narrow it down.
Speaking with the engineering team, they speculated that the power drain caused by firing weapons and absorbing shield impacts could in turn be reducing the available power to thrusters. It was possible that due to lag or CPU spikes the amount of power available for thrusters wasn’t being properly propagated from the remote client to the server to your machine for your physics simulation to account for. This would cause synchronization of thrusters across players to have been off, causing positional jumping when performing erratic maneuvers. Engineering then armed QA with a version of the game that allowed us to have greater control over the thrusters so that we could pinpoint the problem.
Once we disabled all energy fluctuations to the thrusters and enabled maximum power to each, we were still able to get the problem to occur, although less frequently. This testing exposed an issue in the way that the shields consume power in giant spikes and resulted in some balance improvements and code hardening..
Our networking team investigated packet size and bandwidth issues. Their efforts also seemed to have improved the player experience and drastically reduced packet size and thusly network traffic however it did not entirely resolve the issue.
Concurrently, we introduced some additional changes to the way that servers are logging and storing information. We first discovered that certain non-critical errors were spamming the server logging and causing performance drops which can lead simulation discrepancies. We’ve adjusted setting to the logging system to eliminate spam and we are going to be moving the logging function off the main thread entirely as well so that its impact is lessened. Again, this has seemed to have improved the issues but not eliminate them entirely. It has had the knock on effect of improving server stability and performance.
We are generating a build tonight that introduces changes to the way that we synchronize physics calculations. These changes to help better keep all clients and the server in better synch on physics time steps should greatly reduce the incidence of divergence between client, server, and remote client physics simulations and give us better tools to catch and correct for it without warping.
Now, as I am writing this we have just uncovered a new potential issue that could contribute to the incidence of poor multiplayer synchronization having to do with improperly handled client disconnections. We’ll have to investigate this over the weekend.
The great news is that even when we’ve pursued an avenue that hasn’t directly fixed the rubberbanding, we’ve still ended up improving the game. The work that’s going into 12.5 isn’t just going to fix this specific issue, it’s going to enhance the Arena Commander experience all around. We’ve also worked directly to on some other fixes for 12.5 that should bring more stability to the multiplayer experience and correct issues backers have been seeing. As follows:
Joining a Game: Currently when joining a server, a large spike of data is transmitted, which can cause lag and some teleporting. Our network engineers have been working on ways to compress this data which will reduce the size of these connection packets by 40%. You can see the current bandwidth usage in the included data graph.
Attempts to join full servers: Servers currently have a bit of a delay marking that a max player count has been hit. This means that a server can be almost full, and any number of players can try and join, with only one of them getting in, and the rest getting kicked back to hangar. With players leaving, making the server almost full again, this issue can consistently occur throughout the match, and coupled with the high bandwidth of a player connection cause some serious lag. Our server engineers are working on a fix for this right now.
Kicked back to hangar: Our as of yet implemented VOIP system was connecting to all players, but not disconnecting when a player dropped unexpectedly, causing the majority of kick back to hangar issues on public. A fix for this has been created.
Thank you for engaging in the development and testing process with us, your efforts exposing and cataloging these types of issues has been immensely helpful and we wouldn’t be able to find them all without your participation! We fixing these multiplayer issues as quickly as possible and look forward to further expanding the testing of Arena Commander to all Citizens as soon as we can. As you can see there has been an awful lot of work going into improving the multiplayer across the board from the client to the server to the backend infrastructure and we feel that the 12.5 patch will greatly improve the player experience once it is ready. We’ll let you know how we’re doing next week, and will issue patch 12.5 as soon as we’re confident that it offers a broad improvement to the Arena Commander Multiplayer experience.