|
InftyReader Ver.2.7 series
Main Features
OCR software to recognize scientific documents including mathematical formulae
|
| Various output formats: XML,LaTeX,MathML, HTML, HRTeX and MS Word 2007. |
- You can save the recognition results as MS Word 2007 document directly.
InftyReader is OCR software to recognize scientific documents including mathematical formulae, and to output the recognition results into various file formats: LaTeX, MathML, XHTML, HRTeX, IML. It is developed in the laboratory of M. Suzuki, Faculty of Mathematics, Kyushu University, in collaboration
with several cooperation partners.
* InftyReader Ver.2.7.9 (Dec. 25, 2008) --- Product version
A. Fullset Package
Using the full setup package below, you can install InftyReader and the two DicKits A and B at once;
InftyReaderE279_fullset.zip (English Edition, about 50MB) -------- Dec. 25, 2008 
Freely usable for 15 days in total with no page limits.
B. Separate Package
The setup package below of InftyReader is divided into three sub-packages, namely, InftyReaderDicKitA, InftyReaderDicKitB and InftyReaderE27x, for the sake of download facility via internet.
Please note that, to use InftyReader Ver.2.7 series, you need to install the two DicKits(A,B) below to the same folder.
InftyReaderDicKitA.zip (about 23MB) ---------------- Oct. 8, 2007
InftyReaderDicKitB.zip (about 22MB) ---------------- Oct. 8, 2007 
InftyReaderE279.zip (English Edition, about 7MB) -------- Dec. 25, 2008
Freely usable for 15 days in total with no page limits.
If you execute the installers
- InftyReaderDicKitA_Setup.exe,
- InftyReaderDicKitB_Setup.exe, and
- InftyReaderE279_Setup.exe,
contained in the three packages above, then InftyReaderVer.2.7.9 will be installed to your PC.
* Comments about output formats
- IML is the default XML file format of the editor "InftyEditor", an authoring tool of math documents developed by InftyProject. InftyEditor provides a very easy user interface to input and edit math expressions together with ordinary texts.
The English edition of InftyEditor is a free software. Please see the sites of InftyEditor.
- In XHTML format, mathematical expressions are output using MathML notation.
- HR-TeX is a simplified LaTeX-like notation easier "to read" specially
designed for the blinds.
Using InftyEditor, user can correct and edit the recognition results of InftyReader comparing the results with original images, and convert the results into various formats: LaTeX, PDF, XHTML with MathML, etc.
Please note that InftyReader recognizes only <<Black and White>>, <<Binary>> images carefully scanned in either 600DPI or 400 DPI. Please be aware that the program fails to run if the imput image contains gray scale image areas or color image areas even partly.
Image files have to be prepared in either TIFF, GIF, PNG, or BMP format.
* Features
Here are some features of InftyReader Ver. 2.7 :
- It uses the OCR engines of Toshiba Corporation, "ExpressReaderPro", and of MediaDrive Corporation, "WinReader", simultaneously to improve the recognition results of characters in ordinary text areas. (As for the characters and math symbols
in formulae, it uses Infty's OCR).
- It can recognize tables including math expressions in the cells (in case the ruled lines are not broken),
- It can convert PDF files into LaTeX or XHTML(MathML) including
mathematical expressions, except for PDF including color images or gray
images. (Note that InftyReader can process only black and white binary images)
It recognizes the page images of PDF files refering to the text information
imbedded in PDF.
Attention: The original PDF should be of high resolution equivalent to 600dpi scanned images. Someimes PDF files existing on the WEB are of low resolution of the level 200dpi images, in order to reduce those file sizes. In such cases, the recognition results will be of very low quality of the level almost useless!
* Caution ---- Important!
- Source documents have to be clearly printed.
- It should be scanned in "binary" image, in 600dpi (or 400dpi).
- InftyReader erases small noises, segments page images into picture areas,
table areas and text areas automatically, and then recognizes text/table
areas including mathematical expressions.
However, to get better recognition results, users are <<recommended>>
to erase noises and pictures before the recognition.
- In scanning,
it is important to adjust the binarization threshold of the scanner so that
the number of the touched or broken characters is less than 1% of the total
number of the characters in each scanned page image.
* Operating Environment
InftyReader runs on Windows XP, on a PC equipped with 500MB memory or more.
Note that it does not run on Windows 98, Me, nor 2000.
Usability on Windows Vista is not yet verified sufficiently.
To use InftyReader on Windows Vista, user is recommended to verify him or herself that InftyReader runs on his or her own PC before purchasing, and to use it on a
PC equipped with 1.5 GB memory or more.
* How to use InftyReader?
- Select file(s) or folder.
- Input/select output docuent name
- Press the "Start" button.
Then, the recognition results of the selected image files are saved in to the file you specified by the "output docuent name". When, you select a
folder instead of files, all the image files in the folder of the specified
file type (TIF/GIF/PNG/BMP/PDF) are recognized and the results are output
into the files having the name(s) of the folders.
If you set check to the "Search Sub Folders" item under the "Option" menu, InftyReader recognizes all the image files in the sub folders of the selected
folder. For example, if you select the folder "foldertop" having the subfolder
structure below,
- foldertop
|-- subfolder1
| |-- a.tif
| |-- b.tif
|
|-- subfolder2
|-- c.tif
|-- d.tif
and if you select the file type "IML" for the output file type, then, you will
get the files "subfolder1.iml", "subfolder2.iml" in the folder "foldertop".
The recognition results of a.tif and b.tif (resp. c.tif and d.tif) are saved
in the file subfolder1.iml (resp. subfolder2.iml, respectively).
If you select LaTeX as output file type, you will get "subfolder1.tex", "subfolder2.tex", and it is similar for other file types HR-TeX and XHTML.
* License
InftyReader Ver.2.7 series is usable free of charge for 15 days in total after the installation.
If you use three days per week, for example, then you can
use the software for 5 weeks on trial.
For further use, please get a license key from sAccessNet -> click here.
InftyReader is usable under the following license agreement.
(1) You may not modify the software in any manner. You may not reverse engineer, decompile or disassemble the software.
(2) You may not sell the software without making a formal agreement with Science Accessibility Net.
You may distribute the software only free of charge, without modifying the zip-package of the software.
(3) The author shall have no obligation to correct errors and inconveniences of the software.
(4) The author shall not be responsible for any lost and damage caused by the use of the software.
* Report
Any report about the software will be welcome.
--------------------------------------
Non Profit Organization
Science Accessibility Net (sAccessNet)
e-mail: support@sciaccess.net
URL: http://www.sciaccess.net/
--------------------------------------
|