We have been with KH for a few years, we have experienced short downtimes here and there, but support and recovery were usually excellent...until today. Our VPS went down due to RAID controller failure. Within an hour, KH restored the VPS (excellent), however the database (InnoDB) had crashed, cPanel did not work and there were other issues. All was sorted fairly quickly, except for the DB which is essential for our app. We did have backups. And this is where the disaster begins...
02:18 PM - we receive message that senior staff is looking at the problem, as mysql fails to start correctly.
03:27 PM - more than an hour later KH offers to restore from nightly backup, we say that we'd rather not to, but we do have a 20 minutes old backup before crash (we do a backup once an hour) as a last resort in restoring our DB.
03:56 PM - we receive confirmation from KH, that our backup is the best way to proceed.
04:14 PM - we have uploaded the dump and give KH clear green light to restore our DB from our backup.
06:12 PM - full two hours later KH announces that "restore" is successful and we should re-check. We re-checked and the DB was empty. KH apparently did not proceed as we planned and stubbornly persisted in attempt to recover corrupted DB instead for restoring from the dump we provided.
It took us around 20 min to create a new DB from our dump and re-launch the service. We could have relaunched much earlier should we have known that the recovery done by KH will fail or that KH will not stick to the agreed upon plan to use our DB dump. Basically KH wasted multiple precious hours in an unsuccessful attempt to recover 20 min worth of data that we did not prefer to, but were ok to loose.
So here are the questions to KH:
1) Is it normal for InnoDB databases to crash so bad that the whole database is gone?
2) Is it a normal practice to spend ~4 hours to ultimately discover that DB data is unrecoverable?
3) Why did the proposed and confirmed plan to restore from our dump was not followed?
4) Why are the responses so slow - at times we had to wait for over an hour to get the next response?
I do understand that hardware failures do happen, data gets lost even on RAID arrays, and for this we do have hourly backups, but, come on, why it takes more than 4 hours to negotiate a simple mysql DB recovery?
I am not going to judge KH from this one event, but I do believe that there is a plenty of space for improvement in communication at least... both by KH and by us.
p.s. Our VPS many hours after the recovery is way way slower than it was before (CPU is idling, slowness comes from disk reads). I'm told that this is because of recovery procedures done by other tenants, tomorrow we shall see, if it gets any better...
02:18 PM - we receive message that senior staff is looking at the problem, as mysql fails to start correctly.
03:27 PM - more than an hour later KH offers to restore from nightly backup, we say that we'd rather not to, but we do have a 20 minutes old backup before crash (we do a backup once an hour) as a last resort in restoring our DB.
03:56 PM - we receive confirmation from KH, that our backup is the best way to proceed.
04:14 PM - we have uploaded the dump and give KH clear green light to restore our DB from our backup.
06:12 PM - full two hours later KH announces that "restore" is successful and we should re-check. We re-checked and the DB was empty. KH apparently did not proceed as we planned and stubbornly persisted in attempt to recover corrupted DB instead for restoring from the dump we provided.
It took us around 20 min to create a new DB from our dump and re-launch the service. We could have relaunched much earlier should we have known that the recovery done by KH will fail or that KH will not stick to the agreed upon plan to use our DB dump. Basically KH wasted multiple precious hours in an unsuccessful attempt to recover 20 min worth of data that we did not prefer to, but were ok to loose.
So here are the questions to KH:
1) Is it normal for InnoDB databases to crash so bad that the whole database is gone?
2) Is it a normal practice to spend ~4 hours to ultimately discover that DB data is unrecoverable?
3) Why did the proposed and confirmed plan to restore from our dump was not followed?
4) Why are the responses so slow - at times we had to wait for over an hour to get the next response?
I do understand that hardware failures do happen, data gets lost even on RAID arrays, and for this we do have hourly backups, but, come on, why it takes more than 4 hours to negotiate a simple mysql DB recovery?
I am not going to judge KH from this one event, but I do believe that there is a plenty of space for improvement in communication at least... both by KH and by us.
p.s. Our VPS many hours after the recovery is way way slower than it was before (CPU is idling, slowness comes from disk reads). I'm told that this is because of recovery procedures done by other tenants, tomorrow we shall see, if it gets any better...