Disaster Recovery: Fear of Landing
In the early hours of Wednesday the 10th of March, a fire broke out at the OVH data centre in Strasbourg.
The fire started in a room in the building known as SBG2 just before 01:00 local time.
All services in all four buildings (SGB1, SGB2, SGB3 and SGB4) were halted while SGB2 blazed with an uncontrolled fire.
Forty-three fire trucks with over a hundred fire fighters were reported as being on site to fight the fire, using a pump boat on the Rhine to supply water. At 03:00, they isolated the site and closed the perimeter. By 04:00, SBG2 was completely destroyed and SBG1 was on fire, with SBG3 under threat.
By the time I woke up that morning, the fire fighters had gained control of the blaze. It took six hours to put out the fire.
— Antoine Bonin (@abonin_DNA) March 10, 2021
For the first time in over twenty years, I had no personal presence on the Internet. All of my websites were gone.
Later that evening, OVH updated on Twitter.
Update 5:20pm. Everybody is safe.
Fire destroyed SBG2. A part of SBG1 is destroyed. Firefighters are protecting SBG3. no impact SBG4.
— Octave Klaba (@olesovhcom) March 10, 2021
Fear of Landing is (well, was) hosted in SBG3.
“We recommend to activate your Disaster Recovery Plan” are not words that anyone wants to hear. Especially as Fear of Landing’s disaster recovery plan was based on the off-site backup service supplied by OVH.
No back-ups have been forthcoming. Cliff, my partner and my site admin, had additional back-ups; however they were only of the data, so he spent the day rebuilding servers so he could roll out his back-ups and restore service.
He and two other admins, Mark and Rob, worked tirelessly to bring me back online. Yesterday, Cliff got up shortly after 6am, just missing Mark, who had gone to bed at at 4am his time, just minutes earlier. Between the three of them, they restored the websites and the mail servers.
I’m still twitching a bit but mostly ok.
SBG2 was completely destroyed, with no data recovery (where are those back-ups?). Four halls (out of twelve) in SBG1 were destroyed and SBG3’s uninterruptible power source (UPS) was, well, interrupted. Based on reports, Fear of Landing’s server still exists but is not accessible. No word on the off-site back-ups.
The Register points out that three years ago the site suffered a power outage which highlighted design flaws in the location.
The fire comes three years after the group embarked on a €4m-€5m investment plan in the wake of a major outage that left three of the Strasbourg data centres – SBG1, SBG2 and SBG4 – without power for 3.5 hours in November 2017.
[OVH Founder and Chairman] Klaba himself said at the time of the 2017 outage that it was partly because “SBG’s power grid inherited all the design flaws that were the result of the small ambitions initially expected for that location.”
At the time of the 2017 outage, “SBG2’s power grid” was built atop “SBG1’s power grid instead of making them independent of each other”.
I wasn’t the only one affected, of course; OVH is the largest hosting provider in Europe and has many high level customers, including the French government’s vaccination data. Netcraft reported that the fire took out 3.6 million websites across 464,000 distinct domains.
Websites that went offline during the fire included online banks, webmail services, news sites, online shops selling PPE to protect against coronavirus, and several countries’ government websites.
Examples of the latter included websites used by the Polish Financial Ombudsman; the Ivorian DGE; the French Plate-forme des achats de l’Etat; the Welsh Government’s Export Hub; and the UK Government’s Vehicle Certification Agency website.
Closer to home, the European Space Agency and Strasbourg Airport also had servers on the site.
Rust, a popular video game, have announced to their users that all EU servers were lost and no data will be restored.
In the midst of the chaos, I was somewhat bitterly amused by another customer’s response to the situation:
— ACCEIS (@acceis) March 10, 2021
It’s a good thing Cliff didn’t wait for the service to be restored; he and the others worked straight through with the result that here we are, back to normal!
Meanwhile, OVH are rebuilding a new network room in SBG5, with working teams to clean up the site and restore electricity and network services. The latest announcement was that power will be restored to SBG3 on Friday the 19th.
Their website says that if this timeframe is not soon enough, they recommend deploying my infrastructure in another data centre.
There is more information in the OVH statement with a full status report but I can’t get over the idea that, if it were left up to me, Fear of Landing would not have existed for almost two weeks.
OVH Chairman Octave Klaba said in a video that the alarms went off at 00:47 but the first responders were not able to investigate because the smoke was too thick to safely remain in the data centre. Further, the fire department’s thermic camera images showed two UPS systems, UPS7 and UPS8, in the fire.
We had maintenance on UPS7 in the morning. The supplier came and changed a lot of pieces inside of UPS7 and restarted UPS7 in the afternoon. It seems it was working but then in the morning we had the fire.
The data centre cameras apparently have further video footage yet to be analysed and it is hoped that this will give further information about what happened.
There are a lot of questions, of course, starting with how a building of fire-resistant materials and in-built fire supression capabilities could end up in an uncontrolled fire. However, as we know, battery fires can be very hard to extinguish so it seems possible that a faulty UPS in SBG2 was at the heart of the blaze.
On a personal note, the outage and rushed restoration may have some side effects, so please, if you have any issues interacting with the site or when emailing me, please forward any error messages or bounces to me so that I can troubleshoot the problem.
And thank you, again, to Cliff, Mark and Rob for their tireless efforts in restoring Fear of Landing (and everything else!) back onto the Internet.