I don't know about you guys, but I've become increasingly anxious about not being able to back up my reviews here on BL.  Every time the site goes down, I get a itty bitty panic moment.

 

But here's a fun fact I learned today: you can get an entire copy of your website using wget! [1]

Here's how: (At least on Linux.)[2]

 

  • Open a terminal.
  • Type (all one line):

wget --recursive --page-requisites --no-clobber --domains [yourusername].booklikes.com  http://[yourusername].booklikes.com  [location you want to save to, or empty to save in a folder at the current location]

In my case, this was

wget --recursive --page-requisites --no-clobber --domains pagefault.booklikes.com http://pagefault.booklikes.com

  • --recursive is obvious; it means it keeps trying to find pages.
  • --page-requisites means it tries to grab everything necessary rather than just counting on links.
  • --no-clobber means that it won't copy over files it already created.  This is useful if you decide to rerun wget every so often to update things.
  • --domains [your site] means it only pulls pages from your site.

 

I'm absolutely thrilled to have found such an easy solution to the export problem.  I've been feeling more and more anxious about this, but kept waiting for BL to give us one. At this point, enough is enough: given that there is no transparency about storage mechanisms, who knows what problems might develop from the new database?  If they lose my reviews, they're lost: I pulled my reviews from GR, leaving only blurbs and links, and have posted all reviews since then only on BL.  If BL had a server crash, assuming they aren't running backups, I would be officially screwed. But not any more.  Now I have a wget copy.

Yeah, I know. The resulting directory is seriously ugly. But if the worst happens, we can easily create scripts to pull the reviews out of the wget muddle.

I think it makes for a nice compromise between minimal effort and maximum precaution.

 

A few caveats: You can restrict to html pages only, but BL doesn't seem to guarantee using the proper suffix so I wouldn't really advise it. I also don't know how to exclude the shelves stuff.  I'm not sure why, but mine bums out every 5-6 pages of reviews and I had to re-prompt to go back farther, using pagefault.booklikes.com?p=5, ?p=11,  and ?p=17 as the new starting places.  The first time you run it, it may take a while--close to an hour for me. It's not a perfect solution, but it's pretty easy.

 

[1] Yeah, I know. I'm obviously not familiar with wget for this to come as a surprise.

[2] And possibly Mac.  I have no idea about Windows--you could probably use something like http://gnuwin32.sourceforge.net/install.html .