1. Products
  2.   Aspose.PDF
  3.   .NET
  4.   Text Extractor

PDF Text Extractor for .NET

Extract pure, raw, or plain text from PDF documents with Aspose.PDF .NET Plugin

Text Extractor for .NET

Simplify text extraction from PDF documents with the Aspose.PDF Text Extractor for .NET plugin. This versatile tool offers three operation modes: pure, raw, and plain, providing flexibility and convenience for text extraction tasks in .NET applications.

How to Extract Text from PDF via .NET

  • Reference Aspose.PDF in your project
  • Set your license keys
  • Create instances of TextExtractorOptions
  • Add input PDF documents using TextExtractorOptions.AddDataSource
  • Call TextExtractorOptions.Process and assign the result to ResultContainer
  • Access the extracted text using ResultContainer.ResultCollection

Getting Started with PDF Text Extractor

Get the assembly files from the downloads or fetch the package from NuGet to add Aspose.PDF directly to your workspace.

  • Supported operating systems include Windows 7-11, and Windows Server 2003-2022, macOS (10.12+), and Linux
  • Supported frameworks range from 4.0 to 7.0
  • Compatible with various Microsoft Visual Studio versions


How to Extract Text from Multiple PDFs

  • Reference Aspose.PDF for .NET in your project
  • Set your license keys
  • Create instances of TextExtractor & TextExtractorOptions
  • Add input PDF documents using TextExtractorOptions.AddDataSource
  • Call TextExtractor.Process with an instance of TextExtractorOptions as parameter
  • Get the result into an instance of ResultContainer
  • Access extracted text using ResultContainer.ResultCollection

Text Extractor's Operation Modes

  • The Pure option enables text extraction from a PDF file with various formatting procedures, incorporating relative positions and introducing additional spaces to align text to the width of the page
  • The Raw mode extracts text from the PDF file without applying any formatting
  • The Plain mode extracts text from the PDF file, taking into account the relative positioning of text fragments, but unlike the “Pure” mode, it does not add extra space.


Frequently Asked Questions

What does Aspose.PDF Text Extractor for .NET do?

Aspose.PDF Text Extractor for .NET is a plugin designed for .NET applications, offering text extraction from PDF documents with three modes of operation; Pure, Raw, and Plain. It defaults to ‘Raw’ mode, supports versatile input and output options, allows simultaneous processing of multiple PDF files, and provides customization for developers, making it a convenient solution for text extraction within .NET environments.

What is the difference between Aspose.PDF for .NET & Aspose.PDF Text Extractor for .NET?

Aspose.PDF for .NET is a robust .NET API for a wide range of PDF tasks, including document generation, compression, table creation, and advanced features like importing and exporting PDF data. On the other hand, Aspose.PDF Text Extractor for .NET is a specialized plugin focused solely on extracting text from PDF documents, emphasizing text extraction capabilities.

Is Aspose.PDF Text Extractor for .NET limited to only to extract text from PDF?

Yes, PDF Text Extractor for .NET is designed specifically for extracting text from PDF. For other operations you can use other PDF plugins or the full capabilities of the Aspose.PDF library.

Does Aspose.PDF offer an online tool for PDF Text Extraction?

Yes, Aspose.PDF provides a free online PDF Text Parser tool for basic needs.

Where can I find Aspose.PDF Text Extraction examples in C#?

Discover our Landing Pages for Extract Text from PDF for .NET

 English