A quick project to create an ebook from html08-04-2010 3:27 AM permalink
Today I got distracted by small project and thought I'd throw up here for others. I was going through my backlog of online reading material and found this http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-4.html. It's a free, Creative Commons licensed online version of the book Structure and Interpretation of Computer Programs.
It's one of those books that keeps popping up when people give you advice on how to program better. And here it is for free! But due to my fear of a future that includes bad eyesight, I can't try and read this online. What I really wanted was an ebook version for my handy dandy B&N Nook.
The first obstacle was finding some utility that would convert the html version to a format that was compatible with my nook. After some searching I settled on Calibre. It's actually a full featured eBook manager, and it does a bunch of other stuff but I'm really just interested in the conversion tools, which are pretty extensive.
Obstacle #2, Calibre won't just take the url and pull in all of the pages. It wants a downloaded local version of the html files. Such a bother. But this is also easily remedied. I've used pavuk on several occasions in the past. It's an awesome little web crawling utility with a ton of configuration options. I just point this at the table of contents page and tell it to download all of the linked html pages.
pavuk -dont_leave_site -dont_leave_dir -asfx / "html" http://mitpress.mit.edu/sicp/full-text/book/book.html
The only issue here is pavuk is a commandline utility. This is a tech blog, so if that scares you, I officially don't care. If you're on a Mac you can get it from Macports and I recommend Porticus. But you can substitute pavuk for whatever little app you can find that will download these files for you. If you're on Windows you'll probably end up with a virus or malware of some kind. You've been warned.
Now I have a nice folder called sicp_book that has all the html files in it. It's easy to add this to Calibre as an eBook by selecting the table of contents file (the same one in the above link). Calibre is smart enough to pull in all of the other files and create a cohesive ebook entry. Why it couldn't just do this with a link to the online table of contents I can't say. If all software did what I wanted, I'd probably be an accountant or something.
Right now, in my Calibre Library folder, I have a zip file with the html in it that represents my ebook. Calibre can read this just fine and I think the nook could too. But what I really want is the open ePub format. That's why I needed the converter. So using the conversion tools in Calibre, I select the Structure ebook, select html to epub conversion and hit Go. No fuss, no muss.
I end up with an ePub file right next to the original zip file. I drop this on my nook over USB and I'm in business!
Total time, 20 minutes and total cost $5. I donated that to the Calibre guys. I try to pay for useful software. It's hard work, and one day I'd like to ask people to pay me for some useful software I write. It's all about Karma. I would pay $10-$15 for the ebook too, but I wasn't given that option.
Anyway, I'm not clear on the Fair Use rules of the license or I would just post a link to the file. But seriously, this took 20 minutes. If you're bothering to read this blog, you've probably got enough ingenuity to get this going. I'll definitely be using it more often. Need to find a way to automate things a little more first.