Recovered from my own mistakes
Yesterday I noticed a problem while upgrading some of the modules that run the site. It should have been an easy issue. The version numbers for one or two of the modules did not reflect the versions that had been installed, and this was making another module complain, because at least one of the versions not being reported had a security problem.
I did some poking around and realized that I must have put an earlier version of at least one module in the wrong place, and the system was still seeing the old version in the wrong place, not the new version I just tried to install.
So, I went about and fixed this. I deleted what I thought were the wrong versions and made sure the new versions were where they should have been.
And that's when I made my mistake. I worked without a backup.
Some of the modules I use rely on other modules. The layout of some of the pages, including, it seems, the admin pages that only I see, fall into this category. And all of the relationships between modules, and where files are stored and how things go together are all stored in the database.
Mistake number one: not taking a backup of the filesystem and database before deleting or moving things around that had bits stored in the database. Normally, caching for the win. But this time, buckets full of FAIL.
Mistake number two: not remembering all of the tables that participated in content storage for the site.
Did you notice that all the posts from February on are missing?
When I fell back to an earlier backup, trying to make sure I kept the data for the content, I missed one of the tables that I needed to remove from the restore. And that means that while I had the metadata for 17 weeks worth of posts, the content, that was missing.
Fortunately, I haven't posted a lot in the last 17 weeks, but I liked the content that I did post.
I am able to recreate some of the content. Even recreate some of the metadata from other sources (ping.fm, Twitter, Facebook, Flickr). But overall, the words are gone. I know what I wrote, but I don't remember how I wrote it.
So, now I've reverted the site back to a previous backup, and I've started working on making sure that I get backups of the database more often.
The site works again, or at least this post making it to the front page suggests the site is working again. I will not say that the issue is fixed, as I ended up recreating the errors in where modules lived in an attempt to fix that part of the database.
The worst part of the whole thing is I know better.
Backup, and then backup again. Work and then back it up.
And this is what happens when one forgets to act on their habits.
This was not a Drupal problem. This outage was a Butch problem. 