Nuxeo OCR?

Hi, i'm new with Nuxeo and similars..

I want to scan a large bill with a professional machine and I would like the content is indexed. To do this, as I read it is necessary to use an OCR with Nuxeo.

So when I add a document to Nuxeo, will run a script / command that will call the correct OCR for indexing content.

My question is, what OCR recommend?

thanks

0 votes

4 answers

7041 views

ANSWER



You can also have a look at Ephesoft but this is more than simple OCR There is an integration within Nuxeo based on CMIS connectivity http://www.nuxeo.com/en/products/integration/ephesoft-capture

Cheers

0 votes



You can try to install this:

https://github.com/nuxeo/nuxeo-platform-ocr

I ve installed this module and is working nice! It is a plugin that uses Tesseract OCR

Regards

Ruben Bahntje Ushuaia - Argentina

2 votes



rbahntje,

Thanks, ¿Hablas español?

11/20/2011


Thanks Roland for the comment.

Tesseract OCR may be installed inside Nuxeo server, so every time an image document is saved inside Nuxeo, the OCR runs and index the document for future searchs without any user interaction. Is there any way to do the same with Ephesoft? If you run your own application I know you can integrate Ephsoft and Nuxeo using CMIS, but, is there any way to configure Nuxeo so every time a document is upload inside certain folder to run Ephsoft and analyze it?

Regards (and excuse me for my bad english)

Ruben

0 votes



Hi, To keep this OSQA site really readable I have ask your new question here: http://answers.nuxeo.com/questions/639/can-nuxeo-integrate-totally-ephesoft-as-a-native-capture-feed

:-)

Thanks

11/29/2011


Hi, Can you tell me how I can install Nuxeo with Ephesoft or some place where you document?

Do you know if there are virtual machines made ​​with Nuxeo + OCR integration to test?

Thanks

0 votes



Soni,

Ephesoft software is used at the "capture" level, you install it in the workstation where you scan the images to be uploads to the Nuxeo repositories.

The OCR integration provided by https://github.com/nuxeo/nuxeo-platform-ocr#readme it runs inside the repositorie, so the only thing you must do is to upload an image and the OCR create "comments" to the image with the text it recognize inside the image and then you can find this file searching its content.

I ve Nuxeo+OCR running in an Oracle VM Server 2.0 machine, I am waiting the next Nuxeo DM release to make a full new VM from scratch.

If you can wait a few weeks, I will create it and put it in an ftp site so you can download it.

Regards

Ruben

11/24/2011

Ok Rubens,

I try install OCR Integration, but i get few errors when "Build Olena"… I'm use Ubuntu 11

I wish you could create a virtual machine with Nuxeo + OCR to try. Is it possible to prove your actual version, that you have created? But if it is impossible, we hope to come out the new version of Nuxeo DM

Thanks, i wait you.

11/24/2011

@Ruben

Ephesoft is not a client-side OCR solution but server side. Needs a Java server and runs in a browser afaik. You can find some specs here http://www.ephesoft.com/wiki/index.php?title=Main_Page

Cheers, Roland

11/28/2011