Monday, November 17, 2008
Sunday, November 2, 2008
Google recently added the ability to read scanned PDF documents. This is impressive because it's one thing to show a picture of text and another thing for a computer to understand what letters and words are represented there. As mentioned in this article, this provides a poor man's way to get high-end OCR (optical character recognition) done on a scanned PDF one has. So this is my test of the effectiveness of that technique on a document I would love to have in text-form.
I'll update this post if/when the Google bot gets here and indexes the PDF. Thanks Google!
11/28/2008 EDIT: It took a couple weeks, but Google came through with it's Optical Character Recognition flag waving high. Here's a copy of the Google Cache which OCRed this scan of Bluspels and Flalansferes (1939) by C. S. Lewis. But oddly, the OCR/Cache ends mysteriously at just a paragraph (in each column) into page 14. What happened to the rest? Is the Google bot in the process of recognizing this as I type? Time might tell…