Massive (1000 000 docs) import of documents without ACLs with Nuxeo 5.6 and REST
Environment
- OS: Windows server 2008R2 service pack 1
- Java : JDK 1.7 64 bits
- Nuxeo server: heap memory = 999 Mo
- Nuxeo base
- contents : 100 000 documents
- search full index disabled
- document created without ACLs
Scenario
a) After creation of 20 000 more documents
there is '2014-02-07 15:36:59,241 WARN [org.nuxeo.ecm.core.event.tx.PostCommitSynchronousRunner] PostCommitListeners are too slow'
the speed of the document creation is 12 100 docs/hour while it was 25 000 at the beginning.
b) Then after creation of 40 000 more documents
- the speed of the document creation is 3500 docs/hour
- there is a “java heap space out of memory”
Questions
1) How to avoid the message 'PostCommitListeners are too slow' and keep a speed at least 20 000 docs/hour to import 1 000 000 docs in a reasonable time ?
2) How to improve the speed of the document creation knowing that our software can supply Nuxeo with 500 000 docs/hour ?
3) How to avoid “java heap space out of memory” ?
Why Nuxeo uses so much memory while
- we only create documents one by one
- and search if document exists before creation ?
You might want to take a look at Nuxeo's bulk importer to achieve your desired result. Or write your own custom importer for even more control over the import process, transactions, etc.