[Hippo-cms7-user] Uploading PDF fails

Joffrey jlambregs at iprofs.nl
Fri Nov 6 10:41:56 CET 2009


Morning Ard,

We tried to update the pdf to a higher version. i.e. we opened the file and
save it again. This version was uploaded without any problem. Strange thing
however is that later that day a rather small pdf of version 1.4 was upload
without any problem.
For now will have to further investigate this issue and try to find the real
cause of the problem: size, version, combination of both,....
We will update on the progress we make.
All other suggestions are welcome :)

Joffrey



Ard wrote:
> 
> There might indeed be an issue with this: According:
> 
> -------------------------------------------------------------------------------------
> The ISO 32000-1:2008 PDF open standard was published by the ISO on
> July 1, 2008. PDF is now a published ISO standard, titled Document
> management—Portable document format—Part 1: PDF 1.7
> 
> According to the ISO PDF standard abstract:
> 
>     ISO 32000-1:2008 specifies a digital form for representing
> electronic documents to enable users to exchange and view electronic
> documents independent of the environment in which they were created or
> the environment in which they are viewed or printed. It is intended
> for the developer of software that creates PDF files (conforming
> writers), software that reads existing PDF files and interprets their
> contents for display and interaction (conforming readers) and PDF
> products that read and/or write PDF files for a variety of other
> purposes (conforming products).
> ----------------------------------------------------------------------------------------
> 
> it might be the case that 1.4 isn't supported by pdfbox. Is there a
> way you can update the pdf to a 1.7 or higher version to test with?
> 
> Regards Ard
> 
> 
> On Wed, Nov 4, 2009 at 3:36 PM, Joffrey <jlambregs at iprofs.nl> wrote:
>>
>> We replaced the jar file (PDFbox-0.7.3.jar) with the one you mentioned.
>> This
>> was in cms/WEB-INF/lib but the problem remains...
>> Could it have somthing todo with the version of hte pdf file? We noticed
>> it
>> fails on a pdf file version 1.4
>>
>>
>>
>> Ard wrote:
>>>
>>> I am not sure, but if it is just for testing, I would recommend what
>>> is easiest for you...most likely by hand. Would you mind letting us
>>> know whether it solved your issue. If not, there might be some issue
>>> in the pdfbox still...otoh, let's first see whether your issue is
>>> solved,
>>>
>>> Regards Ard
>>>
>>> On Wed, Nov 4, 2009 at 1:14 PM, Joffrey <jlambregs at iprofs.nl> wrote:
>>>>
>>>> Thx for the information Ard,
>>>>
>>>> Do we have to replace the jar file by hand or is this something that
>>>> can
>>>> be
>>>> modified in the pom or any other config file used for building the cms?
>>>>
>>>> Regards,
>>>> Joffrey
>>>>
>>>>
>>>> Ard wrote:
>>>>>
>>>>> Hello Joffrey,
>>>>>
>>>>> can you try to use pdfbox 0.7.3? Most likely you are because this one
>>>>> ships with the ecm through jackrabbit. Can you try to use 0.8.0? see
>>>>> [1].
>>>>>
>>>>> Regards Ard
>>>>>
>>>>> [1] http://incubator.apache.org/pdfbox/download.html
>>>>>
>>>>> On Wed, Nov 4, 2009 at 11:40 AM, Joffrey <jlambregs at iprofs.nl> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> When uploading valid pdf files into the CMS we sometimes get an
>>>>>> exception
>>>>>> in
>>>>>> the logging of the cms saying the file could not be processed due to
>>>>>> a
>>>>>> corrupt header. The file is shown in the assets folder with a size of
>>>>>> 0Kb.
>>>>>> This is the case for a number of pdf files while other files can be
>>>>>> uploaded
>>>>>> without any problem. All files, also the ones that are not accepted
>>>>>> by
>>>>>> the
>>>>>> cms, are valid files that can be opened with Acrobat Reader. Below
>>>>>> the
>>>>>> exception....
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Joffrey
>>>>>>
>>>>>> 04.11.2009 11:21:22 WARN
>>>>>> [org.apache.jackrabbit.extractor.PdfTextExtractor.extractText():91]
>>>>>> Failed
>>>>>> to extract PDF text content
>>>>>> java.io.IOException: Error: Header is corrupt ''
>>>>>>        at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:143)
>>>>>>        at
>>>>>> org.apache.jackrabbit.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:69)
>>>>>>        at
>>>>>> org.apache.jackrabbit.extractor.CompositeTextExtractor.extractText(CompositeTextExtractor.java:90)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor.extractText(JackrabbitTextExtractor.java:195)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.addBinaryValue(NodeIndexer.java:419)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.addValue(NodeIndexer.java:303)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.query.lucene.NodeIndexer.createDoc(NodeIndexer.java:237)
>>>>>>        at
>>>>>> org.hippoecm.repository.query.lucene.ServicingNodeIndexer.createDoc(ServicingNodeIndexer.java:96)
>>>>>>        at
>>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:207)
>>>>>>        at
>>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.aggregateDescendants(ServicingSearchIndex.java:297)
>>>>>>        at
>>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:216)
>>>>>>        at
>>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.createDocument(ServicingSearchIndex.java:151)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.query.lucene.SearchIndex$2.next(SearchIndex.java:557)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.query.lucene.MultiIndex.update(MultiIndex.java:437)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.query.lucene.SearchIndex.updateNodes(SearchIndex.java:541)
>>>>>>        at
>>>>>> org.hippoecm.repository.query.lucene.ServicingSearchIndex.updateNodes(ServicingSearchIndex.java:187)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.SearchManager.onEvent(SearchManager.java:502)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(EventConsumer.java:243)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.observation.ObservationDispatcher.dispatchEvents(ObservationDispatcher.java:201)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:422)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.state.SharedItemStateManager$Update.end(SharedItemStateManager.java:754)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:1100)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:351)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.state.ForkedXAItemStateManager.update(ForkedXAItemStateManager.java:357)
>>>>>>        at
>>>>>> org.hippoecm.repository.jackrabbit.HippoLocalItemStateManager.update(HippoLocalItemStateManager.java:221)
>>>>>>        at
>>>>>> org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:326)
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3944320.html
>>>>>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>>>>>> _______________________________________________
>>>>>> Hippo-cms7-user mailing list and forums
>>>>>> http://www.onehippo.org/cms7/support/community.html
>>>>>>
>>>>> _______________________________________________
>>>>> Hippo-cms7-user mailing list and forums
>>>>> http://www.onehippo.org/cms7/support/community.html
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3944751.html
>>>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>>>> _______________________________________________
>>>> Hippo-cms7-user mailing list and forums
>>>> http://www.onehippo.org/cms7/support/community.html
>>>>
>>> _______________________________________________
>>> Hippo-cms7-user mailing list and forums
>>> http://www.onehippo.org/cms7/support/community.html
>>>
>>>
>>
>> --
>> View this message in context:
>> http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3945543.html
>> Sent from the Hippo CMS 7 mailing list archive at Nabble.com.
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/community.html
>>
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/community.html
> 
> 

-- 
View this message in context: http://n2.nabble.com/Uploading-PDF-fails-tp3944320p3957770.html
Sent from the Hippo CMS 7 mailing list archive at Nabble.com.



More information about the Hippo-cms7-user mailing list