[Hippo-cms7-user] Uploading PDF fails

Ard Schrijvers a.schrijvers at onehippo.com
Wed Nov 4 16:07:33 CET 2009


There might indeed be an issue with this: According:

-------------------------------------------------------------------------------------
The ISO 32000-1:2008 PDF open standard was published by the ISO on
July 1, 2008. PDF is now a published ISO standard, titled Document
management—Portable document format—Part 1: PDF 1.7

According to the ISO PDF standard abstract:

    ISO 32000-1:2008 specifies a digital form for representing
electronic documents to enable users to exchange and view electronic
documents independent of the environment in which they were created or
the environment in which they are viewed or printed. It is intended
for the developer of software that creates PDF files (conforming
writers), software that reads existing PDF files and interprets their
contents for display and interaction (conforming readers) and PDF
products that read and/or write PDF files for a variety of other
purposes (conforming products).
----------------------------------------------------------------------------------------

it might be the case that 1.4 isn't supported by pdfbox. Is there a
way you can update the pdf to a 1.7 or higher version to test with?

Regards Ard


On Wed, Nov 4, 2009 at 3:36 PM, Joffrey <jlambregs at iprofs.nl> wrote:
>
> We replaced the jar file (PDFbox-0.7.3.jar) with the one you mentioned. This
> was in cms/WEB-INF/lib but the problem remains...
> Could it have somthing todo with the version of hte pdf file? We noticed it
> fails on a pdf file version 1.4
>
>
>
> Ard wrote:
>>
>> I am not sure, but if it is just for testing, I would recommend what
>> is easiest for you...most likely by hand. Would you mind letting us
>> know whether it solved your issue. If not, there might be some issue
>> in the pdfbox still...otoh, let's first see whether your issue is
>> solved,
>>
>> Regards Ard
>>
>> On Wed, Nov 4, 2009 at 1:14 PM, Joffrey <jlambregs at iprofs.nl> wrote:
>>>
>>> Thx for the information Ard,
>>>
>>> Do we have to replace the jar file by hand or is this something that can
>>> be
>>> modified in the pom or any other config file used for building the cms?
>>>
>>> Regards,
>>> Joffrey
>>>
>>>
>>> Ard wrote:
>>>>
>>>> Hello Joffrey,
>>>>
>>>> can you try to use pdfbox 0.7.3? Most likely you are because this one
>>>> ships with the ecm through jackrabbit. Can you try to use 0.8.0? see
>>>> [1].
>>>>
>>>> Regards Ard
>>>>
>>>> [1] http://incubator.apache.org/pdfbox/download.html
>>>>
>>>> On Wed, Nov 4, 2009 at 11:40 AM, Joffrey <jlambregs at iprofs.nl> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> When uploading valid pdf files into the CMS we sometimes get an
>>>>> exception
>>>>> in
>>>>> the logging of the cms saying the file could not be processed due to a
>>>>> corrupt header. The file is shown in the assets folder with a size of
>>>>> 0Kb.
>>>>> This is the case for a number of pdf files while other files can be
>>>>> uploaded
>>>>> without any problem. All files, also the ones that are not accepted by
>>>>> the
>>>>> cms, are valid files that can be opened with Acrobat Reader. Below the
>>>>> exception....
>>>>>
>>>>> Thanks in advance,
>>>>> Joffrey
>>>>>
>>>>> 04.11.2009 11:21:22 WARN
>>>>> [org.apache.jackrabbit.extractor.PdfTextExtractor.extractText():91]
>>>>> Failed
>>>>> to extract PDF text content
>>>>> java.io.IOException: Error: Header is corrupt ''
>>>>>        at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:143)
>>>>>        at
>>>>> org.apache.jackrabbit.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:69)
>>>>>        at
>>>>> org.apache.jackrabbit.extractor.CompositeTextExtractor.extractText(CompositeTextExtractor.java:90)
>>>>>        at
>>>>> org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor.extractText(JackrabbitTextExtractor.java:195)
>>>>>        at
>>>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.addBinaryValue(NodeIndexer.java:419)
>>>>>        at
>>>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.addValue(NodeIndexer.java:303)
>>>>>        at
>>>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.createDoc(NodeIndexer.java:237)
>>>>>        at
>>>>> org.hippoecm.repository.query.lucene.ServicingNodeIndexer.createDoc(ServicingNodeIndexer.java:96)
>>>>>        at
>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:207)
>>>>>        at
>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.aggregateDescendants(ServicingSearchIndex.java:297)
>>>>>        at
>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:216)
>>>>>        at
>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:151)
>>>>>        at
>>>>> org.apache.jackrabbit.core.query.lucene.SearchIndex$2.next(SearchIndex.java:557)
>>>>>        at
>>>>> org.apache.jackrabbit.core.query.lucene.MultiIndex.update(MultiIndex.java:437)
>>>>>        at
>>>>> org.apache.jackrabbit.core.query.lucene.SearchIndex.updateNodes(SearchIndex.java:541)
>>>>>        at
>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.updateNodes(ServicingSearchIndex.java:187)
>>>>>        at
>>>>> org.apache.jackrabbit.core.SearchManager.onEvent(SearchManager.java:502)
>>>>>        at
>>>>> org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(EventConsumer.java:243)
>>>>>        at
>>>>> org.apache.jackrabbit.core.observation.ObservationDispatcher.dispatchEvents(ObservationDispatcher.java:201)
>>>>>        at
>>>>> org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:422)
>>>>>        at
>>>>> org.apache.jackrabbit.core.state.SharedItemStateManager$Update.end(SharedItemStateManager.java:754)
>>>>>        at
>>>>> org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:1100)
>>>>>        at
>>>>> org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:351)
>>>>>        at
>>>>> org.apache.jackrabbit.core.state.ForkedXAItemStateManager.update(ForkedXAItemStateManager.java:357)
>>>>>        at
>>>>> org.hippoecm.repository.jackrabbit.HippoLocalItemStateManager.update(HippoLocalItemStateManager.java:221)
>>>>>        at
>>>>> org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:326)
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3944320.html
>>>>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>>>>> _______________________________________________
>>>>> Hippo-cms7-user mailing list and forums
>>>>> http://www.onehippo.org/cms7/support/community.html
>>>>>
>>>> _______________________________________________
>>>> Hippo-cms7-user mailing list and forums
>>>> http://www.onehippo.org/cms7/support/community.html
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3944751.html
>>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>>> _______________________________________________
>>> Hippo-cms7-user mailing list and forums
>>> http://www.onehippo.org/cms7/support/community.html
>>>
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/community.html
>>
>>
>
> --
> View this message in context: http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3945543.html
> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/community.html
>



More information about the Hippo-cms7-user mailing list