OutOfmemory exception when processing a large number of documents

Hello,

I want to process a large number of documents (around 3 million). The process consists of updating a metadata of each document from another metadata (so I will need to do a read and write from the database).

For that, I was thinking of using elasticsearch's SCROLL API.

The problem is that I will have a “java.lang.OutOfMemoryError: GC overhead limit exceeded” exception in the middle of processing (knowing that I have xmx = xms = 24g in JAVA_OPTS)

I have tried different configuration for the garbage collector but no great effect noticed.

Can someone help me or give me an idea how to process a large batch of documents in nuxeo.

Thank you in advance.

0 votes

2 answers

869 views

ANSWER



Hello,

I have used the bulkAction mechanism in nuxeo and it looks good.

Thank you

0 votes



Can you help me how to use bulk upload with custom metadata?

0 votes



Hello,

What type of bulk upload do you use ?

for example, the following doc https://doc.nuxeo.com/nxdoc/nuxeo-bulk-document-importer/ explains how to use Nuxeo Bulk Document Importer.

there are other types like for example the csv importer…

04/21/2021