Document Parser .NET API
Product Page | Docs | Demos | API Reference | Examples | Blog | Search | Free Support | Temporary License
This text parser on-premise API works well to search & extract formatted text as well as the raw text from a variety of documents of supported file formats.
Document Parser Processing Features
- Parse documents by user-defined templates.
- Extract plain and structured text.
- Extract text areas with coordinates, text styles, and other information.
- Search text by a keyword or regular expression; extract text around that word.
- Extract HTML or Markdown (MD) formatted text for a fast preview.
- Increase performance by extracting raw text.
- Extract formatted text, metadata, images, containers, and attachments.
- Extract table of contents for some supported document formats.
- Parse form data from PDF documents.
Parse Document by Template
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF, TXT
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM, NUMBERS
Presentation: PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Portable: PDF
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, CSV, XLA, XLAM, NUMBERS
Presentation: PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Markup: XHTML, MHTML, MD, XML
eBook: CHM, EPUB, FB2
Portable: PDF
OneNote: ONE
Databases: Databases are supported via ADO.NET. To work with the corresponding database format install its database provider.
Spreadsheet: XLS, XLT, XLSX, XLSM, XLTX, XLTM, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM
Portable: PDF
Extract Structured Text and Formatted Text
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLTX, XLTM, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Markup: MD (Formatted Text is Not supported)
eBook: CHM, EPUB, FB2
Please visit the Supported Document Formats for more details.
GroupDocs.Parser for .NET does not require any external software or third-party tool to be installed. GroupDocs.Parser for .NET supports any 32-bit or 64-bit operating system where .NET or Mono framework is installed. The other details are as follows:
Microsoft Windows: Microsoft Windows Desktop (x86, x64) (XP & up), Microsoft Windows Server (x86, x64) (2000 & up), Windows Azure
Mac OS: Mac OS X
Linux: Linux (Ubuntu, OpenSUSE, CentOS and others)
Development Environments: Microsoft Visual Studio (2010 & up), Xamarin.Android, Xamarin.IOS, Xamarin.Mac, MonoDevelop 2.4 and later.
Supported Frameworks: GroupDocs.Conversion for .NET supports .NET and Mono frameworks.
Get Started
Are you ready to give GroupDocs.Parser for .NET a try? Simply execute Install-Package GroupDocs.Parser
from Package Manager Console in Visual Studio to fetch & reference GroupDocs.Parser assembly in your project. If you already have GroupDocs.Parser for .Net and want to upgrade it, please execute Update-Package GroupDocs.Parser
to get the latest version.
Please check the GitHub Repository for other common usage scenarios.
// create an instance of Parser class
using(Parser parser = new Parser(Constants.SampleZip)) {
// extract images from document
IEnumerable < PageImageArea > images = parser.GetImages();
// check if images extraction is supported
if (images == null) {
Console.WriteLine("Page images extraction isn't supported");
return;
}
// create the options to save images in PNG format
ImageOptions options = new ImageOptions(ImageFormat.Png);
int imageNumber = 0;
// iterate over images
foreach(PageImageArea image in images) {
// save the image to the png file
image.Save(imageNumber.ToString() + ".png", options);
imageNumber++;
}
}
Product Page | Docs | Demos | API Reference | Examples | Blog | Search | Free Support | Temporary License