March Database Failure

March Database Failure

  • Thursday, 28th March, 2019
  • 01:54am

Good Evening,

This type of announcement is not something to be proud of to release. As everyone is well aware, USHost247 experienced a massive impact on services of all accounts on our network. Beginning on March 19, 2019.

On March 19th, 2019, we encountered a database failure and account databases were repaired and the server was analysed by the server administration as well as server technicians from a 3rd party to assist us in locating the problem. 

2019-03-19 15:06:27 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=235445622070
2019-03-19 15:06:27 0 [Note] InnoDB: Starting final batch to recover 2417 pages from redo log.
2019-03-19 15:06:28 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2019-03-19 15:06:28 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2019-03-19 15:06:28 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2019-03-19 15:06:28 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2019-03-19 15:06:28 0 [Note] InnoDB: Waiting for purge to start
2019-03-19 15:06:28 2 [ERROR] [FATAL] InnoDB: Rec offset 99, cur1 offset 10043, cur2 offset 16192
190319 15:06:28 [ERROR] mysqld got signal 6 ;

After our MySQL crashed with signal 6, we attempted InnoDB forced recovery of the databases and were able to recover all accounts that were affected. Continuing to investigate, it is to be believed that the MariaDB hit a bug which caused the databases to fail and eventually crash. 

March 25th, 2019, our databases went down again with the same outcome as March 19th right after we received notifications of connection loss. The MariaDB crashed. We attempted to recover most accounts with the most recent backup, which was the morning of March 24th, but most of the backups were failing to restore. Our support team continued to try different backup restores until reaching backups from February, in which case were catastrophic for websites that contained forums and other software that had data that frequently changed. This being unacceptable, we attempted to restore March 24th backups once more, with success on many accounts and some accounts that were usable but with some data corruption.
While checking the server, there were no indications of kernel panic and no lack of adequate disk space in the server.

The database issue seems to be due to some MySQL bug. We could see this in logs where mysqld died. 
stack_bottom = 0x7464d4ca2c30 thread_stack 0x49000
*** buffer overflow detected ***: /usr/sbin/mysqld terminated

While our intentions are to provide the best customer service experience, there has been some delay in releasing information and replying to tickets in a timely manner due to the urgency our staff has been working to restore service to all clients.

We encourage all clients to take regular backups to keep on their local machines, as these can assist us in restoring your account in the event that our copy fails.

Our investigation is not over with and we will continue to release information as it becomes available.

Back