Tuesday, March 04, 2008

Extracting Text from PDF Files

Ever had the need to extract the text from a PDF file?

At the simplest end, it's often possible to extract text and graphics just by using the standard copy and paste functions in Adobe Reader and it may allow you to save the file as Text but if you want to do something more, then you have another couple of options…

Adobe Online Conversion Tools

This conversion service will convert Adobe PDF files that are in English and most West European languages to text. If the PDF file(s) you need converted are on your local disk, then you can simply mail your files as attachments to them and they'll send them back converted for you. For plain text, mail the attached PDF to pdf2txt@adobe.com and for HTML, mail the attached PDF to pdf2html@adobe.com. Alternatively, if you can access the PDF document via URL, then use the form on the main Online Conversion Tools page.

NB: The conversion technology was developed to allow blind and visually impaired users to read Adobe PDF documents with speech synthesis software. For this reason, graphic elements are stripped from the file and text is reformatted during conversion.

PDFTextOnlinePDFTextOnline is another online conversion service. Basically, you upload a PDF document and they pass it over to their PDFTextStream service, which extracts the first 10 pages and passes it back to you in your browser. If your PDF is larger than 10 pages, then you can browse to the next 10 pages for conversion and so on.

Their selling point, even if it is free, is the fact that they claim that PDFTextStream is the only PDF text extraction API that uses its own OCR-like process to properly order text extracts. The result is that PDFTextStream produces the most accurate PDF text extracts available today. Of course, if you have a need to do some serious or long-term PDF conversions, then you can buy PDFTextStream.

Other free, online conversion services include PDFConverter, which can convert PDF documents to Word, Excel or Rich Text format and you'll find Zamzar and Media-Convert can convert PDFs to text and lots of other formats too.

Related Posts: PDF Hammer - An Online PDF Editor, Free PDF Creation, More Online File Conversion Services, Free PDF Creation, Free Online Media File Converter

2 comments:

Jim Green said...

The conversion of pdf to text is similar to OCR technology, and it is very practical.

Charles said...

Thank you so much for this nice information. Hope so many people will get aware of this and useful as well

Sentiment Analysis Tool

Entity Extraction Tool

Churn Prevention Software

OCR Solutions