Scanned PDF to Text OCR

Extract text from scanned PDFs or convert them into searchable documents. Read any layout and style, accurately define the structure of text and tables. Preserve original images in the background for content retention. Aspose.OCR - Your PDF text extraction solution for .NET.

Aspose.OCR Scanned PDF to Text for .NET

Aspose.OCR’s .NET OCR plug-in extracts text from scanned PDFs or converts them into searchable documents, preserving original images. Advanced algorithms accurately identify text and table structures, making it your go-to solution for PDF text extraction.

How to Use Scanned PDF to Text Plugin

Install the Aspose.OCR package from NuGet or a locally downloaded file.
Set your license keys.
Load a scanned image into the OcrInput object.
Create an instance of the Aspose.OCR recognition engine.
Extract text from a scanned PDF.
Output the recognized text or save it to a file.

Get Scanned PDF to Text Converter Plugin

Get the respective assembly files from the downloads or fetch the package from NuGet to add Aspose.OCR directly to your workspace.

Compatible with Microsoft Windows or a compatible OS with .NET Standard 2.0
Requires a development environment like Microsoft Visual Studio.

Additional Features

Support for multi-page PDFs to extract text from each page.
Customizable text recognition settings for improved accuracy.
Integration with other Aspose libraries for advanced document processing.

System Requirements

.NET Standard 2.0 or above is required to run the plugin.
Compatible with Microsoft Windows operating systems.
Adequate memory and disk space for optimal performance.

Frequently Asked Questions

Is specifying a language necessary?

By default, Aspose.OCR can automatically recognize a wide range of languages based on the Extended Latin alphabet. However, providing a specific language can significantly enhance recognition accuracy. Explicitly specify the language when recognizing Cyrillic, Chinese, and Hindi texts.

What file formats are supported?

Aspose.OCR supports popular formats from scanners or cameras, including PDF, JPEG, PNG, and TIFF. Recognition results are returned in plain text, HTML, Microsoft Word, PDF, JSON, and XML.

How to achieve the best result?

Good image quality is crucial for accurate OCR. Use a scanner or high-resolution camera. The library includes advanced filters to automatically improve image quality before recognition.

Where to find more information and examples?

Explore our online documentation or visit the Aspose.OCR for .NET repository for code samples and showcase projects.