[Hippo-cms7-user] Uploading PDF fails

Ard Schrijvers a.schrijvers at onehippo.com
Wed Nov 4 13:27:39 CET 2009


I am not sure, but if it is just for testing, I would recommend what
is easiest for you...most likely by hand. Would you mind letting us
know whether it solved your issue. If not, there might be some issue
in the pdfbox still...otoh, let's first see whether your issue is
solved,

Regards Ard

On Wed, Nov 4, 2009 at 1:14 PM, Joffrey <jlambregs at iprofs.nl> wrote:
>
> Thx for the information Ard,
>
> Do we have to replace the jar file by hand or is this something that can be
> modified in the pom or any other config file used for building the cms?
>
> Regards,
> Joffrey
>
>
> Ard wrote:
>>
>> Hello Joffrey,
>>
>> can you try to use pdfbox 0.7.3? Most likely you are because this one
>> ships with the ecm through jackrabbit. Can you try to use 0.8.0? see
>> [1].
>>
>> Regards Ard
>>
>> [1] http://incubator.apache.org/pdfbox/download.html
>>
>> On Wed, Nov 4, 2009 at 11:40 AM, Joffrey <jlambregs at iprofs.nl> wrote:
>>>
>>> Hi all,
>>>
>>> When uploading valid pdf files into the CMS we sometimes get an exception
>>> in
>>> the logging of the cms saying the file could not be processed due to a
>>> corrupt header. The file is shown in the assets folder with a size of
>>> 0Kb.
>>> This is the case for a number of pdf files while other files can be
>>> uploaded
>>> without any problem. All files, also the ones that are not accepted by
>>> the
>>> cms, are valid files that can be opened with Acrobat Reader. Below the
>>> exception....
>>>
>>> Thanks in advance,
>>> Joffrey
>>>
>>> 04.11.2009 11:21:22 WARN
>>> [org.apache.jackrabbit.extractor.PdfTextExtractor.extractText():91]
>>> Failed
>>> to extract PDF text content
>>> java.io.IOException: Error: Header is corrupt ''
>>>        at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:143)
>>>        at
>>> org.apache.jackrabbit.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:69)
>>>        at
>>> org.apache.jackrabbit.extractor.CompositeTextExtractor.extractText(CompositeTextExtractor.java:90)
>>>        at
>>> org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor.extractText(JackrabbitTextExtractor.java:195)
>>>        at
>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.addBinaryValue(NodeIndexer.java:419)
>>>        at
>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.addValue(NodeIndexer.java:303)
>>>        at
>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.createDoc(NodeIndexer.java:237)
>>>        at
>>> org.hippoecm.repository.query.lucene.ServicingNodeIndexer.createDoc(ServicingNodeIndexer.java:96)
>>>        at
>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:207)
>>>        at
>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.aggregateDescendants(ServicingSearchIndex.java:297)
>>>        at
>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:216)
>>>        at
>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:151)
>>>        at
>>> org.apache.jackrabbit.core.query.lucene.SearchIndex$2.next(SearchIndex.java:557)
>>>        at
>>> org.apache.jackrabbit.core.query.lucene.MultiIndex.update(MultiIndex.java:437)
>>>        at
>>> org.apache.jackrabbit.core.query.lucene.SearchIndex.updateNodes(SearchIndex.java:541)
>>>        at
>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.updateNodes(ServicingSearchIndex.java:187)
>>>        at
>>> org.apache.jackrabbit.core.SearchManager.onEvent(SearchManager.java:502)
>>>        at
>>> org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(EventConsumer.java:243)
>>>        at
>>> org.apache.jackrabbit.core.observation.ObservationDispatcher.dispatchEvents(ObservationDispatcher.java:201)
>>>        at
>>> org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:422)
>>>        at
>>> org.apache.jackrabbit.core.state.SharedItemStateManager$Update.end(SharedItemStateManager.java:754)
>>>        at
>>> org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:1100)
>>>        at
>>> org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:351)
>>>        at
>>> org.apache.jackrabbit.core.state.ForkedXAItemStateManager.update(ForkedXAItemStateManager.java:357)
>>>        at
>>> org.hippoecm.repository.jackrabbit.HippoLocalItemStateManager.update(HippoLocalItemStateManager.java:221)
>>>        at
>>> org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:326)
>>>
>>>
>>> --
>>> View this message in context:
>>> http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3944320.html
>>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>>> _______________________________________________
>>> Hippo-cms7-user mailing list and forums
>>> http://www.onehippo.org/cms7/support/community.html
>>>
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/community.html
>>
>>
>
> --
> View this message in context: http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3944751.html
> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/community.html
>



More information about the Hippo-cms7-user mailing list