disaster-recovery – Row Coding

What’s the first thing you check when an untouched unix server starts going berserk?

November 21, 2022 by Tarik

First Order: Is it responsive? If you can’t log in, there’s bigger problems afoot. This generally comes in two flavors: hardware failure, and software failure. Both are potentially catastrophic. To prevent DFA errors, check the general hardware health first – a simple glance-over usually will suffice. Second Order: Are the system’s underlying structures in good … Read more

How do I backup my TRAC installations?

November 18, 2022 by Tarik

To fully recover trac environment you need following things: backup DB; backup configuration files; backup wiki files (html and attachments); backup password files if you’re using htpasswd auth; optional plugins (even though this are available for download, I’d backup them for quicker recovery); In case of the standard setup (with SQLite as BD backend), this … Read more

How to actually use mysql slave as soon the master is failover or got burnt

November 16, 2022 by Tarik

For a DR solution you most likely want a semi-manual process. That is, you need to decide the disaster validates a full DR failover and it’s not just a small network blipp and you’re stuck with days of fail-back work. To switch a MySQL slave to a master you just issue a few commands in … Read more

Battery Backed Write Cache

November 16, 2022 by Tarik

What exactly does it do? The excerpt from this Compaq document explains it well: Power interruptions, even for brief moments, result in the loss of data which was being written to or read from storage… Power interruptions can have terminal effects on data which is in the process of being written and is temporarily residing … Read more

IT lead does not have a backup, DR plan in writing [closed]

October 20, 2022 by Tarik

First, it’s YOUR business and the first step is for YOU to determine what your business continuity and disaster recovery needs and objectives are. Have you defined and documented those? If not, do so. A BC/DR is NOT just about the technology and the data. Once you’ve done that you can present them to this … Read more

Setting up a new backup scheme

October 18, 2022 by Tarik

I would highly recommend the book “Backup & Recovery” (O’Reilly Book) by W. Curtis Preston http://oreilly.com/catalog/9780596102463/ Asking how to do your backup plan is kinda like asking 10 grandmothers how to make the best chicken noodle soup. You’ll get 10 different answers but all of them will agree on the basic ingredients. In my opinion, … Read more

How to recover from a drive failure in a RAID 5 configuration?

October 18, 2022 by Tarik

The system is running very slowly because it has to reconstruct the missing data which involves additional CPU and I/O. If you have a missing disk in a RAID-5 configuration you have no recovery strategy. If another disk goes down you will lose your data. Run, don’t walk, to the nearest vendor from which you … Read more

Documentation As-A-Manual vs. Documentation As-A-Checklist

October 10, 2022 by Tarik

When writing mine I’ve always devolved into writing two three sets. The get-er-done checklist, with a MUCH LONGER appendix about the architecture of the system including why things are done the way they are, probable sticking points when coming online, and abstract design assumptions. followed by a list of probable problems and their resolutions, followed … Read more

Architecture for highly available MySQL with automatic failover in physically diverse locations

October 8, 2022 by Tarik

You will face the “CAP” theorem problem. You cannot have consistency, availability and partition-tolerance at the same time. DRBD / MySQL HA relies on synchronous replication at the block device level. This is fine while both nodes are available, or if one suffers a temporary fault, is rebooted etc, then comes back. The problems start … Read more

Retrieving an RSA key from a running instance of Apache?

September 26, 2022 by Tarik

SUCCESS! I was able to retrieve the private key. But it wasn’t easy. Here’s what you need to do: Make sure you do not restart the server or Apache. The game is over at that point. That also means making sure that no monitoring services restart Apache. Grab this file – source code for a … Read more