Jump to content
  • Sign in to follow this  

    Tech Updates #2


    Rhododendron

    The past few days have been a nightmare for the website server. I will be explaining what exactly happened and the changes that were made.

     

    First, our MySQL server is a monster. It is constantly connected by over 20 game servers that are constantly reading and writing. Given the fact that I setup the server around 3 years ago, it went through some noobish configurations by yours truly before I became knowledgeable about how exactly to run one efficiently. My knowledge of server experience is significantly better than 3 years ago, but there are some things that simply cannot be easily fixed, such the 'mysql' database that stores user information, and configurations about how other servers connect to it. For the past year, it was spamming the error log with hundreds of errors about stuff that simply couldn't be fixed manually, and I was forced to either deal with the constant crashes, time-syncing issues, and mild diarrhea. So I began researching about how to configure our server to function best with our available hardware.

     

    Now don't think I purposefully corrupted the 'mysql' database. It went through 3 significant software upgrades. MySQL 5.4 to MariaDB 5.5 and finally we are on MariaDB 10.0.7. While I did the conversion right, a vanilla installation would have been better but I put it off for some time.

    So what exactly happened earlier this week that resulted in the MySQL server going down? Well there was an extremely cryptic error, and the only decipherable thing was that that a table in the 'mysql' database magically disappeared. The error logs started being spammed uncontrollably to the point where it got to 300Gb in size. This resulted in a time-syncing issue with InnoDB and everything went downhill from there. Since I take backups of the entire MySQL server every 3 hours, it's very easy to restore a backup. So I decided to shut down the server and restore the backups, but the errors were still spamming and the issues were still happening. So I decided to completely uninstall the MariaDB server, and reinstall it completely. The only thing that remained from the old server were the user login information for all the users and the white-listed IPs. I also decided to completely revamp the format of the server after reading this article: Yoshinori Matsunobu's blog: Tables on SSD, Redo/Binlog/SYSTEM-tablespace on HDD on the benefits of SSDs and HDDs in accordance with MySQL. After this, I restored all the backups and brought up the server. The difference was tremendous. Zero errors have been recorded since the reinstallation, and all pages are loading on average under 100ms which is insane for a site like ours, with tons of images and custom content. Overall the changes were incredibly significant and everything should so smooth from here with our MySQL server.

     

    Finally, what happened to the website and why are some people unable to load it?

     

    Well our website was using Incapsula as a CDN and DNS management service, mainly for a cheap SSL wildcard certificate (purchasing one costs ~$300 a year which is out of our price range) to allow secure browsing on the website. Last month our traffic was insane (the attacks against NFO might have caused this but I have yet to find the exact issue) and was roughly 2.7 TB. The Incapsula bill came earlier this month and the bill was $450.20. I immediately contacted support since that seemed like some strange glitch (we are on the $19 per month plan) but was told that the 'Personal' plan has a bandwidth cap of 500GB per month, as well as a rate of 5mbps of fair usage. This confused me since when I signed up, their plan page mentions none of this whatsoever. After asking about the lack of information provided about the bandwidth cap, they told me that they recently updated their plans (without changing the plans page) with these bandwidth limitations, and that legacy plans would have to be manually updated. One of the sales team then contacted me, credited our account with the money, and told us that we would have to upgrade to the Enterprise plan if our bandwidth was to continue being ~2.7TB a month. The Enterprise plan is $1,100 a month and since Xeno Gamers isn't a commercial business, I respectfully decided to switch services back to CloudFlare since they have an unlimited bandwidth cap (their SSL plan is more limited but it wouldn't effect our current setup). This required updating the nameservers over to CloudFlare's, which would result in an extremely heavy DNS change. It normally takes 24-72 hours to finish (it's the internet, nobody controls when it updates) so that's the reason there might have been some server crashing and website unavailability. Again, if you were unable to access the website it wasn't my fault and is a waiting game from here on out.

     

    Other updates:

    - CentOS 6.5 released, updated website server.

    - NginX 1.5.8 released, updated webserver.

    - ngx_pagespeed v1.7.30.2 beta released, updated webserver.

    - OpenSSL v1.0.1e package released officially with CentOS 6.5, compiled custom version with 'Elliptic Curve' enabled.

    - NginX SSL protocol changed to use 'Elliptic Curve', page loading significantly faster over SSL.

    - ErLang updated, Elixir updated, allowing for the Teamspeak chat bot application to function again.

    - XenForo updated to 1.2.4, various addons updated as well.

    - Various server updates provided by Nomulous, including a new Europe location (managed by Stickz).

    - Revamped user-groups on the forum, now requires assigning moderation powers using the 'Moderator' feature, resulting in far easier management of promotions for users.

    - Re-signed our self-signed SSL certificate to better integrate with CloudFlare.

    - Abandoned 'Axivo' repo in order for self-compiled applications.

    - Heavily revamped 'my.cnf' for better performance including doubling the max connection count (unrelated to the issue described above).

    - Fixed template glitches with 'Xeno Gamers v7'. Will revamp color scheme soon since it's currently unsatisfactory.

    - Fixed permission issues for staff members on the website. Some might still remain so please report them!

    - Teamspeak server updated to 3.0.10.3

     

    That's all for now, and I apologize for all the downtime!

    Sign in to follow this  


    User Feedback

    Recommended Comments

    Translate pls

    Webserver be download too many shits for it's plan. Hosts be so angry they overcharge us saying it's out fault. But silence be like "fek u, u no writ on website plan change", so he wont and we switched to different host. We need cheap ssl so your passwords don't get hacked by a 1337 h4x0r watching your computer. Teh reason y site no work 4 u is becuz updates this big take a while for u to update your pc to werk. Other shitz be like windows update, but 4 linux, dun worri aboot it.

     

    For the more literate:

    Our website was downloading/uploading too many things at once, which in return caused our bandwidth to be too big for the plan that we are currently paying for. Which, unlike buying internet at home, buying website hosting is more expensive and having unlimited bandwidth is quite expensive (which in this case was 4.7 TB of bandwidth used). Since Silence won the argument that the change in hosting plans wasn't posted immediately, he won the conflict and did not have to pay the hefty bill which was imposed on him. Since as well, an SSL certificate (so your passwords and forum SEND requests are encrypted for no third party sniffers to find your passwords) is quite expensive. Due to these unforeseen consequences, we switched to a cheaper host which provides unlimited bandwidth and an SSL certificate at an affordable price. The rest, imagine it as windows update for windows*

    Share this comment


    Link to comment
    Share on other sites


    Please sign in to comment

    You will be able to leave a comment after signing in



    Sign In Now