Product Page | Docs | Demos | API Reference | Examples | Blog | Free Support | Temp License
Search & extract text as well as the images, metadata, attachments from a [variety of documents] (https://docs.groupdocs.com/parser/net/supported-document-formats/).
Document Parser Library for .NET
- Parse documents by user-defined templates.
- Extract plain and structured text.
- Extract text areas with coordinates, text styles, and other information.
- Search text by a keyword or regular expression; extract text around that word.
- Extract HTML or Markdown (MD) formatted text for a fast preview.
- Increase performance by extracting raw text.
- Extract formatted text, metadata, images, containers, and attachments.
- Extract table of contents for some supported document formats.
- Parse form data from PDF documents.
Parse Document by Template
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF, TXT
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, CSV, XLA, XLAM, NUMBERS
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Portable: PDF
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF, TXT
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, CSV, XLA, XLAM, NUMBERS
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Markup: HTML, XHTML, MHTML, MD, XML
eBooks: CHM, EPUB, FB2
Portable: PDF
Notes: ONE
Databases: Databases are supported via ADO.NET. To work with the corresponding database format install its database provider.
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM
Portable: PDF
Extract Structured Text and Formatted Text
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLTX, XLTM, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Markup: MD (Formatted Text is Not supported)
eBooks: CHM, EPUB, FB2
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM, NUMBERS
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Portable: PDF
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
eBooks: EPUB, FB2
Portable: PDF
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM, NUMBERS
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Portable: PDF
eBook: CHM, EPUB, FB2
Markup: HTML
Archive: ZIP
Extract Containers and Attachments
Email: PST, OST, EML, EMLX, MSG
Portable: PDF
Archive: ZIP
Portable: PDF
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
eBooks: CHM, EPUB
Portable: PDF
Databases: Databases are supported via ADO.NET. To work with the corresponding database format install its database provider.
GroupDocs.Parser for .NET does not require any external software or third party tool to be installed. GroupDocs.Parser for .NET supports any 32-bit or 64-bit operating system where .NET or Mono framework is installed. The other details are as follows:
Microsoft Windows: Microsoft Windows Desktop (x86, x64) (XP & up), Microsoft Windows Server (x86, x64) (2000 & up), Windows Azure
macOS: Mac OS X
Linux: Linux (Ubuntu, OpenSUSE, CentOS and others)
Development Environments: Microsoft Visual Studio (2010 & up), Xamarin.Android, Xamarin.IOS, Xamarin.Mac, MonoDevelop 2.4 and later.
Supported Frameworks: GroupDocs.Parser for .NET supports .NET and Mono frameworks.
// create an instance of Parser class
using (Parser parser = new Parser(Constants.SampleZip))
{
// extract images from document
IEnumerable<PageImageArea> images = parser.GetImages();
// check if images extraction is supported
if (images == null)
{
Console.WriteLine("Page images extraction isn't supported");
return;
}
// create the options to save images in PNG format
ImageOptions options = new ImageOptions(ImageFormat.Png);
int imageNumber = 0;
// iterate over images
foreach (PageImageArea image in images)
{
// save the image to the png file
image.Save(imageNumber.ToString() + ".png", options);
imageNumber++;
}
}
Product Page | Docs | Demos | API Reference | Examples | Blog | Free Support | Temp License