💻 Software

Extracting text from a .PDF scanned book

Freshabout 2 months ago

Mar 15, 20265146 views

Confidence Score0%

Problem

I have a scanned a book in PDF format, but the quality is rather poor: (The language is Romanian and it's a medical physiology book, in case you were wondering) I want to extract text from the book (1500 pages) but keep the images the way they are. I really don't think I have any chance to find a s…

Unverified for your environment

Select your OS to check compatibility.

Your OS

OS version

Product version

1 Fix

Canonical Fix

Unverified Fix

New Fix – Awaiting Verification

Fix for: Extracting text from a .PDF scanned book

Low Risk

I have earlier posted an answer detailing how to use Cuneiform (open source software) to do OCR on PDF files and how to create a PDF file with the recognized text in a hidden text layer "behind" the original image. As far as I know, Cuneiform actual…

Awaiting Verification

Be the first to verify this fix

Extracting text from a .PDF scanned book

Problem

1 Fix

Fix for: Extracting text from a .PDF scanned book

Environment