[Hippo-cms7-user] Uploading PDF fails

Bart van der Schans b.vanderschans at onehippo.com
Mon Nov 9 09:53:52 CET 2009


FYI, here is the upstream issue about the upgrade to pdfbox 0.8.0:
https://issues.apache.org/jira/browse/JCR-2388

Regards,
Bart


On Fri, Nov 6, 2009 at 11:06 AM, Ard Schrijvers
<a.schrijvers at onehippo.com> wrote:
> Helllo Joffrey,
>
> On Fri, Nov 6, 2009 at 10:41 AM, Joffrey <jlambregs at iprofs.nl> wrote:
>>
>> Morning Ard,
>>
>> We tried to update the pdf to a higher version. i.e. we opened the file and
>> save it again. This version was uploaded without any problem. Strange thing
>> however is that later that day a rather small pdf of version 1.4 was upload
>> without any problem.
>> For now will have to further investigate this issue and try to find the real
>> cause of the problem: size, version, combination of both,....
>> We will update on the progress we make.
>> All other suggestions are welcome :)
>
> you might want to check whether it is a known issue in pdfbox [1] or
> drop a mail on their list. That might save you quite some time. It
> might just be that some older versions are not supported. If you have
> feedback on the issue, we'd really like to hear from you :-)) Many
> thanks so far for keeping us in the loop
>
> Regards Ard
>
> [1] http://incubator.apache.org/pdfbox/
> [2] http://incubator.apache.org/pdfbox/mailing-list.html
>
>>
>> Joffrey
>>
>>
>>
>> Ard wrote:
>>>
>>> There might indeed be an issue with this: According:
>>>
>>> -------------------------------------------------------------------------------------
>>> The ISO 32000-1:2008 PDF open standard was published by the ISO on
>>> July 1, 2008. PDF is now a published ISO standard, titled Document
>>> management—Portable document format—Part 1: PDF 1.7
>>>
>>> According to the ISO PDF standard abstract:
>>>
>>>     ISO 32000-1:2008 specifies a digital form for representing
>>> electronic documents to enable users to exchange and view electronic
>>> documents independent of the environment in which they were created or
>>> the environment in which they are viewed or printed. It is intended
>>> for the developer of software that creates PDF files (conforming
>>> writers), software that reads existing PDF files and interprets their
>>> contents for display and interaction (conforming readers) and PDF
>>> products that read and/or write PDF files for a variety of other
>>> purposes (conforming products).
>>> ----------------------------------------------------------------------------------------
>>>
>>> it might be the case that 1.4 isn't supported by pdfbox. Is there a
>>> way you can update the pdf to a 1.7 or higher version to test with?
>>>
>>> Regards Ard
>>>
>>>
>>> On Wed, Nov 4, 2009 at 3:36 PM, Joffrey <jlambregs at iprofs.nl> wrote:
>>>>
>>>> We replaced the jar file (PDFbox-0.7.3.jar) with the one you mentioned.
>>>> This
>>>> was in cms/WEB-INF/lib but the problem remains...
>>>> Could it have somthing todo with the version of hte pdf file? We noticed
>>>> it
>>>> fails on a pdf file version 1.4
>>>>
>>>>
>>>>
>>>> Ard wrote:
>>>>>
>>>>> I am not sure, but if it is just for testing, I would recommend what
>>>>> is easiest for you...most likely by hand. Would you mind letting us
>>>>> know whether it solved your issue. If not, there might be some issue
>>>>> in the pdfbox still...otoh, let's first see whether your issue is
>>>>> solved,
>>>>>
>>>>> Regards Ard
>>>>>
>>>>> On Wed, Nov 4, 2009 at 1:14 PM, Joffrey <jlambregs at iprofs.nl> wrote:
>>>>>>
>>>>>> Thx for the information Ard,
>>>>>>
>>>>>> Do we have to replace the jar file by hand or is this something that
>>>>>> can
>>>>>> be
>>>>>> modified in the pom or any other config file used for building the cms?
>>>>>>
>>>>>> Regards,
>>>>>> Joffrey
>>>>>>
>>>>>>
>>>>>> Ard wrote:
>>>>>>>
>>>>>>> Hello Joffrey,
>>>>>>>
>>>>>>> can you try to use pdfbox 0.7.3? Most likely you are because this one
>>>>>>> ships with the ecm through jackrabbit. Can you try to use 0.8.0? see
>>>>>>> [1].
>>>>>>>
>>>>>>> Regards Ard
>>>>>>>
>>>>>>> [1] http://incubator.apache.org/pdfbox/download.html
>>>>>>>
>>>>>>> On Wed, Nov 4, 2009 at 11:40 AM, Joffrey <jlambregs at iprofs.nl> wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> When uploading valid pdf files into the CMS we sometimes get an
>>>>>>>> exception
>>>>>>>> in
>>>>>>>> the logging of the cms saying the file could not be processed due to
>>>>>>>> a
>>>>>>>> corrupt header. The file is shown in the assets folder with a size of
>>>>>>>> 0Kb.
>>>>>>>> This is the case for a number of pdf files while other files can be
>>>>>>>> uploaded
>>>>>>>> without any problem. All files, also the ones that are not accepted
>>>>>>>> by
>>>>>>>> the
>>>>>>>> cms, are valid files that can be opened with Acrobat Reader. Below
>>>>>>>> the
>>>>>>>> exception....
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>> Joffrey
>>>>>>>>
>>>>>>>> 04.11.2009 11:21:22 WARN
>>>>>>>> [org.apache.jackrabbit.extractor.PdfTextExtractor.extractText():91]
>>>>>>>> Failed
>>>>>>>> to extract PDF text content
>>>>>>>> java.io.IOException: Error: Header is corrupt ''
>>>>>>>>        at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:143)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:69)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.extractor.CompositeTextExtractor.extractText(CompositeTextExtractor.java:90)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor.extractText(JackrabbitTextExtractor.java:195)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.addBinaryValue(NodeIndexer.java:419)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.addValue(NodeIndexer.java:303)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.createDoc(NodeIndexer.java:237)
>>>>>>>>        at
>>>>>>>> org.hippoecm.repository.query.lucene.ServicingNodeIndexer.createDoc(ServicingNodeIndexer.java:96)
>>>>>>>>        at
>>>>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:207)
>>>>>>>>        at
>>>>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.aggregateDescendants(ServicingSearchIndex.java:297)
>>>>>>>>        at
>>>>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:216)
>>>>>>>>        at
>>>>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:151)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.query.lucene.SearchIndex$2.next(SearchIndex.java:557)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.query.lucene.MultiIndex.update(MultiIndex.java:437)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.query.lucene.SearchIndex.updateNodes(SearchIndex.java:541)
>>>>>>>>        at
>>>>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.updateNodes(ServicingSearchIndex.java:187)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.SearchManager.onEvent(SearchManager.java:502)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(EventConsumer.java:243)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.observation.ObservationDispatcher.dispatchEvents(ObservationDispatcher.java:201)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:422)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.state.SharedItemStateManager$Update.end(SharedItemStateManager.java:754)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:1100)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:351)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.state.ForkedXAItemStateManager.update(ForkedXAItemStateManager.java:357)
>>>>>>>>        at
>>>>>>>> org.hippoecm.repository.jackrabbit.HippoLocalItemStateManager.update(HippoLocalItemStateManager.java:221)
>>>>>>>>        at
>>>>>>>> org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:326)
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3944320.html
>>>>>>>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>>>>>>>> _______________________________________________
>>>>>>>> Hippo-cms7-user mailing list and forums
>>>>>>>> http://www.onehippo.org/cms7/support/community.html
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Hippo-cms7-user mailing list and forums
>>>>>>> http://www.onehippo.org/cms7/support/community.html
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3944751.html
>>>>>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>>>>>> _______________________________________________
>>>>>> Hippo-cms7-user mailing list and forums
>>>>>> http://www.onehippo.org/cms7/support/community.html
>>>>>>
>>>>> _______________________________________________
>>>>> Hippo-cms7-user mailing list and forums
>>>>> http://www.onehippo.org/cms7/support/community.html
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3945543.html
>>>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>>>> _______________________________________________
>>>> Hippo-cms7-user mailing list and forums
>>>> http://www.onehippo.org/cms7/support/community.html
>>>>
>>> _______________________________________________
>>> Hippo-cms7-user mailing list and forums
>>> http://www.onehippo.org/cms7/support/community.html
>>>
>>>
>>
>> --
>> View this message in context: http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3957770.html
>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/community.html
>>
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/community.html
>



-- 
Hippo B.V.  -  Amsterdam
Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466

Hippo USA Inc.  -  San Francisco
101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646
-----------------------------------------------------------------
http://www.onehippo.com   -  info at onehippo.com
-----------------------------------------------------------------



More information about the Hippo-cms7-user mailing list