InftyReader

InftyReader is OCR software to recognize scientific documents including mathematical formulae (STEM documents).

"InftyReader" converts PDF and scanned images to various types of accessible documents: LaTeX, XHTML(MathML), HRTeX, IML, Microsoft Word document, EPUB3, PDF with TeX and Chattybook (Audio HTML).
For the scanned image files or Image PDF produced from scanned images, InftyReader uses OCR specially trained for STEM documents recognizing special math symbols and analyzing math structures.
For e-born PDF(*), InftyReader uses a PDF parser rather than OCR, so the character recognition results are very accurate, not only for ordinary texts but also math symbols.

(*) E-born PDF is the PDF produced by authoring tools such as LaTeX system, MS Word, Adobe InDesign, etc. The PDF produced from image files are called Image PDF.

*InftyReader Ver.3.3.2.5 (June 15, 2024)

Personal Use License package:
InftyReader3325.zip (English Edition, about 206MB) --- June 15, 2024 new

Enterprise License package:
InftyReader3325_Enterprise.zip(English Edition, about 206MB) --- June 15, 2024 new

What's new (Ver.3.3.2.5):

Improvement of the detection of figures, tables and arrays(matrices).
Improvement of the recogniton of framed texts.

For general information about InftyReader, please read "AboutInftyReaderE.txt" here.

What's the difference between the Enterprise edition and the personal use edition? Please read: "About InftyReader Enterprise".

License Update. The serial numbers of InftyReader ver.3.1 and 3.2 are valid for Ver.3.3, so all the users of Inftyreader ver.3.1 and 3.2 series can use the version 3.3 series without any additional cost.

Trial Use. To use InftyReader in the Trial Mode, please see AboutTrialUse.txt.

Remark. Please avoid using the file names and the path names including NON-ASCII characters as the input file for InftyReader.

* Purchase with Stripe

InftyReader
Standard license : 200USD

InftyReader
One Year license : 40USD

You can get a serial number immediately after the payment on the purchase site above, and you can use it to activate InftyReader on your PC to start using InftyReader.

* Purchase with Paypal

In case you wish to purchase with PaPal, please visit here.

Please note that you will receive the serial number within 2 business days after the payment with PaPal.

Document for blind users.

Below is the Introduction to InftyReader for blind users given by Prof. John Gardner (Oregon State University & ViewPlus Technology) at the ICCHP Summer University 2011.

Introduction to InftyReader by Prof. John Gardner.

* Comments about output formats

IML is the default XML file format of the editor "InftyEditor", an authoring tool of math documents developed by InftyProject. InftyEditor provides a very easy user interface to input and edits math expressions together with ordinary texts.
The English edition of InftyEditor is free software. Please see the sites of InftyEditor.
LaTeX is a widely used common markup language for writing mathematical documents among specialists in science.
In XHTML format, mathematical expressions are output using MathML notation.
HR-TeX is a simplified LaTeX-like notation easier "to read" specially designed for the blinds.
Word XML output from InftyReader can be directly imported into Microsoft Word.
In EPUB3, format, mathematical expressions are output using MathML notation.
In PDF with TeX is a newly proposed Accessible PDF. Its front image is the same as the original PDF, and the text and math information is embedded behind the image rearranged in the usual reading order. Math expressions are embedded using HR-TeX (Human Readable TeX) notation. To get the "PDF with TeX" output, users are recommended to install Ghostscript. Its download site is below:
https://www.ghostscript.com/download/gsdnld.html
If you have Ghostscript installed on your PC, the front image of the output PDF will be the high quality vector Image generated by Ghostscript.

* Caution ---- Important!

To recognize image files:

Source documents have to be clearly printed.
It should be scanned in 600dpi (or 400dpi) with no distortion. Usually, binary images are better for recognition than color images.
InftyReader erases small noises, segments page images into picture areas, table areas ,and text areas automatically, and then recognizes text/table areas including mathematical expressions.
However, to get better recognition results, users are <<recommended>> to erase noises and pictures before the recognition.
In scanning, it is important to adjust the binarization threshold of the scanner so that the number of touched or broken characters is less than 1% of the total number of characters in each scanned page image.

* Operating Environment

InftyReader runs on Windows 10 and 11, on a PC equipped with at least 2GB free memory available for the application.

* How to use InftyReader?

Select file(s) or folder.
Input/select output document name
Press the "Start" button.

Then, the recognition results of the selected image files are saved into the file you specified by the "output document name". When you select a folder instead of files, all the image files in the folder of the specified file type (TIFF/GIF/PNG/BMP/PDF) are recognized and the results are output into the files having the name(s) of the folders.

If you set to check to the "Search Sub Folders" item under the "Option" menu, InftyReader recognizes all the image files in the subfolders of the selected folder. For example, if you select the folder "folder to" having the subfolder structure below,

foldertop
|-- subfolder1
|        |-- a.tif
|        |-- b.tif
|
|-- subfolder2
         |-- c.tif
         |-- d.tif

and if you select the file type "IML" for the output file type, then, you will get the files "subfolder1.iml", "subfolder2.iml" in the folder "folder to". The recognition results of a.tif and b.tif (resp. c.tif and d.tif) are saved in the file subfolder1.iml (resp. subfolder2.iml, respectively).

If you select LaTeX as output file type, you will get "subfolder1.tex", "subfolder2.tex", and it is similar for other file types HR-TeX and XHTML.

* License

InftyReader is usable under the following license agreement.

(1) You may not modify the software in any manner. You may not reverse engineer, decompile or disassemble the software.
(2) You may not sell the software without making a formal agreement with Science Accessibility Net.
You may distribute the software only free of charge, without modifying the zip package of the software.
(3) The author shall have no obligation to correct errors and inconveniences of the software.
(4) The author shall not be responsible for any loss and damage caused by the use of the software.
(5) The license is limited to personal use, including the case purchased by an institution for the specified user. Shared use by a small group member is also allowed. In the default setting, the number of pages recognizable by this license is limited to 10000 pages per month. In case an institution uses the software to service several clients or to digitize huge numbers of volumes, please use the enterprise version, reading the page here: About Enterprise License. For more details, please contact us.

* Report

Any report about the software will be welcome.

--------------------------------------
Non-Profit Organization
Science Accessibility Net (sAccessNet)
e-mail: support"at"sciaccess.net (Please replace "at" by @.)
URL: http://www.sciaccess.net/
--------------------------------------