Nonprofit Organization  Science Accessibility Net

OCR software for mathematical document Logo of InftyReader InftyReader

Skip the navigation

Go to the top of sAccess Net

Go to the top of InftyProject

InftyReader Ver.3 series

  • The price is down to the HALF (please see here for more details) new
  • "EPUB3" and "PDF with TeX" are added in the output formats of the recognition results. new
Main Features

OCR software to recognize scientific documents including mathematical formulas.

Various output formats: XML, LaTeX, MathML, HTML, Word, EPUB3, etc.

Direct conversion of PDF to the formats above including Math.

Recognition of image on clipboard and past the result into Word (see here).

InftyReader is OCR software to recognize scientific documents including mathematical formulae, and to output the recognition results into various file formats: LaTeX, MathML, XHTML, HRTeX, IML and Microsoft Word document. It is developed in the laboratory of M. Suzuki, Faculty of Mathematics, Kyushu University, in collaboration with several cooperation partners.

*InftyReader Ver. (Jan. 5, 2022)

Personal Use License package: (English Edition, about 199MB) -------- Jan. 5, 2022 new

What's new (Ver.

  • Improvement of the processing speed of e-born PDF recognition
  • Correction od xHTML(MathML) output for the documents including non-ascii characters

What's new (Ver.

  • Empowerment of the recognitoin of e-born PDF (Font embedded PDF) by using hybrid OCR for special symbols of unknown fonts by the PDF parser.

What's new (Ver.

  • Suppot of 25 languages for e-born PDF (Font embedded PDF): Czech, Danish, Dutch, English, Finish, French, German, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Norwegian, Romanian, Russian, Polish, Portuguise, Slovak, Spanish, Swedish, Thai, Turkish, Vietnamese

What's new (Ver.

  • Renewal of LaTeX / Text conversion program to support different languages including non-ascii characters.

For the general information about InftyReader, please read "AboutInftyReaderE.txt" here. .

Enterprise License package: Edition, about 199MB) ------ Jan. 5, 2022 new

What's the difference from the personal use edition? Please read: "About InftyReader Enterprise".

License Update. The serial numbers of InftyReader ver.3.1 and 3.2 are valid for Ver.3.3, so all the users of Inftyreader ver.3.1 and 3.2 series can use the verision 3.3 series without any additional cost.

Trial Use. To use InftyReader in the Trial Mode, please see: AboutTrialUse.txt.

Remark. Please avoid to use the file names and the path names including NON-ASCII characters as the input file for InftyReader..

PDF recognition / parsing.

The recent versions of InftyReader(Ver.3.2 series and 3.3 series) can convert (e-born) PDF to the above mentioned various formats using PDF parser instead of OCR to obtain exact character codes. Please see the site of InftyReaderLite. (All the functions of InftyReaderLITE are included in the standard version of InftyReader.)

About PDF with JPEG2000 images.

Since the ver., InftyReader became able to process PDF including jpeg2000 images, so it is not neccessary to pre-process pdf to change image formats.

Document for blind users.

Below is the Introduction to InftyReader for blind users given by Prof. John Gardner (Oregon State University & ViewPlus Technology) at the ICCHP Summer University 2011.

Introduction to InftyReader by Prof. John Gardner.

* Comments about output formats

  1. IML is the default XML file format of the editor "InftyEditor", an authoring tool of math documents developed by InftyProject. InftyEditor provides a very easy user interface to input and edit math expressions together with ordinary texts.
    The English edition of InftyEditor is a free software. Please see the sites of InftyEditor.
  2. LaTeX is a widely used comon markup language of write mathematical documents among specialists in science.
  3. In XHTML format, mathematical expressions are output using MathML notation.
  4. HR-TeX is a simplified LaTeX-like notation easier "to read" specially designed for the blinds.
  5. Word XML ouput from InftyReader can be directly imported into Microsoft Word.
  6. In EPUB3, format, mathematical expressions are output using MathML notation.
  7. In PDF with TeX is a newly proposed Accessible PDF. Its front image is the same as the original PDF, and the text and math information is imbedded behind the image rearranged in the usual reading order. Math expressions are enbedded using HR-TeX (Human Readable TeX) notation. To get "PDF with TeX" output, users are recommended to install Ghastscript. Its download site is below:
    If you have Ghostscript installed in your PC, the front image of output PDF will be the hight quality vector Image generated by Ghostscript.

Using InftyEditor, user can correct and edit the recognition results of InftyReader comparing the results with original images, and convert the results into various formats: LaTeX, PDF, XHTML with MathML, etc.

Please note that InftyReader recognizes only <<Black and White>>, <<Binary>> images carefully scanned in either 600DPI or 400 DPI. Please be aware that the program fails to run if the imput image contains gray scale image areas or color image areas even partly.
Image files have to be prepared in either TIFF, PNG, or BMP format. InftyReader recognizes also PDF. It converts input PDF to PNG file first and then recognizes the converted image files.

* Features

Here are some features of InftyReader since Ver. 2.8 :

  1. It uses the OCR engines of Toshiba Corporation, "ExpressReaderPro", and of MediaDrive Corporation, "WinReader", simultaneously to improve the recognition results of characters in ordinary text areas. (As for the characters and math symbols in formulae, it uses Infty's OCR).
  2. It can recognize tables including math expressions in the cells (in case the ruled lines are not broken),
  3. It can convert PDF files into LaTeX or XHTML(MathML) including mathematical expressions, except for PDF including color images or gray images. (Note that InftyReader can process only black and white binary images)
    It recognizes the page images of PDF files refering to the text information imbedded in PDF.

    Attention: The original PDF should be of high resolution equivalent to 600dpi scanned images. Someimes PDF files existing on the WEB are of low resolution of the level 200dpi images, in order to reduce those file sizes. In such cases, the recognition results will be of very low quality of the level almost useless!

* Caution ---- Important!

  1. Source documents have to be clearly printed.
  2. It should be scanned in in 600dpi (or 400dpi). Usualy, binary images are better for the recognition than color images.
  3. InftyReader erases small noises, segments page images into picture areas, table areas and text areas automatically, and then recognizes text/table areas including mathematical expressions.
    However, to get better recognition results, users are <<recommended>> to erase noises and pictures before the recognition.
  4. In scanning, it is important to adjust the binarization threshold of the scanner so that the number of the touched or broken characters is less than 1% of the total number of the characters in each scanned page image.

* Operating Environment

InftyReader runs on Windows 10, on a PC equipped with at least 2GB free memory available for the application.

* How to use InftyReader?

  1. Select file(s) or folder.
  2. Input/select output docuent name
  3. Press the "Start" button.

Then, the recognition results of the selected image files are saved in to the file you specified by the "output docuent name". When, you select a folder instead of files, all the image files in the folder of the specified file type (TIF/GIF/PNG/BMP/PDF) are recognized and the results are output into the files having the name(s) of the folders.

If you set check to the "Search Sub Folders" item under the "Option" menu, InftyReader recognizes all the image files in the sub folders of the selected folder. For example, if you select the folder "foldertop" having the subfolder structure below,

  1. foldertop
    |-- subfolder1
    |        |-- a.tif
    |        |-- b.tif
    |-- subfolder2
             |-- c.tif
             |-- d.tif

and if you select the file type "IML" for the output file type, then, you will get the files "subfolder1.iml", "subfolder2.iml" in the folder "foldertop". The recognition results of a.tif and b.tif (resp. c.tif and d.tif) are saved in the file subfolder1.iml (resp. subfolder2.iml, respectively).

If you select LaTeX as output file type, you will get "subfolder1.tex", "subfolder2.tex", and it is similar for other file types HR-TeX and XHTML.

* License

To use InftyReader, please get a license key from sAccessNet -> click here.

As for the trial use, please see: AboutTrialUse.txt

InftyReader is usable under the following license agreement.

(1) You may not modify the software in any manner. You may not reverse engineer, decompile or disassemble the software.
(2) You may not sell the software without making a formal agreement with Science Accessibility Net.
You may distribute the software only free of charge, without modifying the zip-package of the software.
(3) The author shall have no obligation to correct errors and inconveniences of the software.
(4) The author shall not be responsible for any lost and damage caused by the use of the software.
(5) The license is basically limited to personal use, including the case purchased by an institution for specified user. Shared use by a small group members is also allowed. In the default setting, the number of the pages recognizable by this license is limited to 10000 pages per monthe. In case an institution uses the software to service a number of clients or to digitize huge numbers of volumes, please use the enterprise version, reading the page here. For more details, please contact us.

* Report

Any report about the software will be welcome.

Non Profit Organization
Science Accessibility Net (sAccessNet)
e-mail: support"at" (Please replace "at" by @.)


 TOP of this page