General metadata extraction from MS Office (primarily MS Word)

What are the steps a developer needs to go through to display custom metadata information from a MS Word document (the standard Nuxeo extraction is rather poor) ?

Let me give an example. Suppose most documents in our company have a custom MS Word property called “Document Description” and when I navigate to this document in my workspace, I would like to see the Document Description field in the “Metadata” part in the “Summary” tab page of the document.

I assume there are multiple steps to be taken here to achieve this behaviour …

Would Studio help with this (automatic metadata extraction) or not ?

0 votes

1 answers

2034 views

ANSWER



Studio only won't be enough but could help define some new metadata fields in the Nuxeo document types to host your custom metadata and the matching layout to display (or manually edit them using from the web browser).

If you are a Java developer, you could write an Automation Operation in Java using the Nuxeo IDE that embeds the Apache Tika library for the extraction it-self and then plug it to a user action or a event listener to trigger the extraction whenever a document is modified.

0 votes