[Hippo-cms7-user] Indexing hippo:resource item in Hippo CMS 2.16.02 takes a long time

Jeroen Reijn j.reijn at onehippo.com
Fri Dec 3 12:28:47 CET 2010


I've created a JIRA issue for this and have already created a patch. I will
commit it to both trunk and branch.

You can track progress at: https://issues.onehippo.com/browse/HREPTWO-4802

Jeroen

On Wed, Dec 1, 2010 at 2:34 PM, Mathijs Brand <m.brand at onehippo.com> wrote:

> Thanks for the effort.
> @Jeroen. Really looking forward to your patch!
>
> On Wed, Dec 1, 2010 at 2:30 PM, Jeroen Reijn <j.reijn at onehippo.com> wrote:
> > We've already implemented this in our project. I'll try to see if I can
> find
> > some time to donate this patch.
> > Jeroen
> >
> > On Wed, Dec 1, 2010 at 2:18 PM, Frank van Lankvelt
> > <f.vanlankvelt at onehippo.com> wrote:
> >>
> >> >>>>> this functionality is not broken in the CMS; it was never
> >> >>>>> implemented.
> >> >>>>> Would be nice though...
> >> >>>> Hmmm, that is really a bummer because the repository has all the
> bits
> >> >>>> and bytes for it. We have customers who are using this technique
> >> >>>> already. How come it is not yet part of the CMS? It is just adding
> >> >>>> one
> >> >>>> utility class which would pretty much just do what I wrote in a
> >> >>>> repository unit test for text extraction. I was really under the
> >> >>>> impression that this would be added as the default to the CMS. I
> hope
> >> >>>> we can pick it up to make it work this way
> >> >>> By the way I also vaguely recall that the last thing we discussed
> >> >>> about it that it might have to be part of the DerivedData Engine, to
> >> >>> make sure it would work out of the box for any plugin, even when
> >> >>> uploading pdf's from the HST for example. Still think it is a pity
> we
> >> >>> did not make this final step (yet)
> >> >>>
> >> >> yes, extracting the text in a derived data function would be nice,
> but
> >> >> I'm afraid it wouldn't work for updates. Say, you upload a newer
> >> >> version of the same PDF.  The upload logic should still be aware of
> >> >> the hippo:text property, if only to remove it.
> >> >>
> >> >> To do this completely in the repository, a checksum for the pdf
> should
> >> >> be stored next to the extracted text.  Then, when the derived data
> >> >> function is triggered, it can re-calculate the checksum and do the
> >> >> extraction again when it differs.  Checksumming is a lot cheaper than
> >> >> extracting.
> >> >>
> >> >> Also, I'm not sure if it is possible at this moment to access a
> binary
> >> >> property in a derived data function, but supporting that should be
> >> >> possible.
> >> > Do you think we could then, as an alternative, at least provide this
> >> > functionality for the default cms plugins that do upload binaries such
> >> > as a pdf? Then, later on, see if we can fix it in Derived Data Engine?
> >> > I really think it would be a pity to not leverage the very good
> >> > improvement we can have with this hippo:text property, just because
> >> > the 'work everywhere' solution needs possibly quite some time to
> >> > implement. For the short run, I would opt for simply adding it to the
> >> > cms plugins, but, of course, I am on thin ice here as I am not
> >> > familiar with the code base
> >> >
> >> yeah, this shouldn't be that hard.  I think that's also how some
> >> projects already implemented this, so perhaps they can step up and
> >> donate a patch?
> >>
> >> cheers, Frank
> >>
> >> > Regards Ard
> >> >
> >> >>
> >> >> cheers, Frank
> >> >>
> >> >>
> >> >>
> >> >>> Regards Ard
> >> >>>
> >> >>>>
> >> >>>> Regards Ard
> >> >>>>
> >> >>>>>
> >> >>>>> cheers, Frank
> >> >>>>>
> >> >>>>>
> >> >>>>> On Wed, Dec 1, 2010 at 1:06 PM, Ard Schrijvers
> >> >>>>> <a.schrijvers at onehippo.com> wrote:
> >> >>>>>> As you can see here [1], the cnd contains this property already
> in
> >> >>>>>> your version. The idea is, that when uploading the pdf, this
> >> >>>>>> property
> >> >>>>>> gets the extracted text. This ensures, extraction won't be needed
> >> >>>>>> again (particularly important in clustered setup). Now, if this
> >> >>>>>> property does not get set (it is not down by the repo, but every
> >> >>>>>> plugin that uploads binaries should do this), this can mean that:
> >> >>>>>>
> >> >>>>>> 1) It is broken in the cms
> >> >>>>>> 2) You are using a custom plugin that does not do the extraction
> >> >>>>>> and
> >> >>>>>> setting of the hippo:text binary property (the logic needed to do
> >> >>>>>> this
> >> >>>>>> is really trivial)
> >> >>>>>>
> >> >>>>>> Hope this gives you enough pointers to help figure out why you do
> >> >>>>>> not
> >> >>>>>> have that property
> >> >>>>>>
> >> >>>>>> Regards Ard
> >> >>>>>>
> >> >>>>>> [1]
> >> >>>>>>
> https://svn.hippocms.org/repos/hippo/hippo-cms7/archive/tags/Tag-HREPTWO-v2_16_02/repository/engine/src/main/resources/hippo.cnd
> >> >>>>>>
> >> >>>>>> [hippo:resource]
> >> >>>>>> - jcr:encoding (string)
> >> >>>>>> - jcr:mimeType (string) mandatory
> >> >>>>>> - jcr:data (binary) primary mandatory
> >> >>>>>> - jcr:lastModified (date) mandatory ignore
> >> >>>>>> - hippo:text (binary)
> >> >>>>>>
> >> >>>>>> On Wed, Dec 1, 2010 at 12:31 PM, Mathijs Brand
> >> >>>>>> <m.brand at onehippo.com> wrote:
> >> >>>>>>> Hi Ard,
> >> >>>>>>>
> >> >>>>>>> On Wed, Dec 1, 2010 at 11:51 AM, Ard Schrijvers
> >> >>>>>>> <a.schrijvers at onehippo.com> wrote:
> >> >>>>>>>> What actually takes a long time is the text-extraction of a
> pdf.
> >> >>>>>>>> However, it should only happen once: We fixed this for hippo
> >> >>>>>>>> repository by having an extra property on the hippo:resource.
> >> >>>>>>>> Namely,
> >> >>>>>>>> hippo:text which is a binary.
> >> >>>>>>>
> >> >>>>>>> Great, you've already fixed this and thanks for the quick reply
> :)
> >> >>>>>>>
> >> >>>>>>>> Can you confirm this property is on your
> >> >>>>>>>> resources?
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>> I don't see the hippo:text property on the hippo:resource
> >> >>>>>>> document.
> >> >>>>>>>
> >> >>>>>>> I see:
> >> >>>>>>> - jcr:mimeType
> >> >>>>>>> - jcr:lastModified
> >> >>>>>>> - jcr:encoding
> >> >>>>>>>
> >> >>>>>>> Kind regards,
> >> >>>>>>> Mathijs
> >> >>>>>>> _______________________________________________
> >> >>>>>>> Hippo-cms7-user mailing list and forums
> >> >>>>>>> http://www.onehippo.org/cms7/support/forums.html
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> Hippo
> >> >>>>>> Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31
> >> >>>>>> (0)20 522 4466
> >> >>>>>> Hippo USA Inc  • 755 Baywood Drive  • Second Floor  • Petaluma,
> CA
> >> >>>>>>  •
> >> >>>>>> 94954 USA  • Phone +1 (707) 658-4535
> >> >>>>>> Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC
> >> >>>>>> H2T
> >> >>>>>> 1S5  •  +1 (514) 316 8966
> >> >>>>>> www.onehippo.com  •  www.onehippo.org  •  info at onehippo.com
> >> >>>>>> _______________________________________________
> >> >>>>>> Hippo-cms7-user mailing list and forums
> >> >>>>>> http://www.onehippo.org/cms7/support/forums.html
> >> >>>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>> Hippo
> >> >>>>> Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31
> >> >>>>> (0)20 522 4466
> >> >>>>> USA  • San Francisco  185 H Street Suite B  •  Petaluma CA
> >> >>>>> 94952-5100
> >> >>>>> •  +1 (707) 773 4646
> >> >>>>> Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal
> >> >>>>> QC
> >> >>>>> H2T 1S5  •  +1 (514) 316 8966
> >> >>>>> www.onehippo.com  •  www.onehippo.org  •  info at onehippo.com
> >> >>>>> _______________________________________________
> >> >>>>> Hippo-cms7-user mailing list and forums
> >> >>>>> http://www.onehippo.org/cms7/support/forums.html
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Hippo
> >> >>>> Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31
> >> >>>> (0)20 522 4466
> >> >>>> Hippo USA Inc  • 755 Baywood Drive  • Second Floor  • Petaluma, CA
>> >> >>>> 94954 USA  • Phone +1 (707) 658-4535
> >> >>>> Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC
> H2T
> >> >>>> 1S5  •  +1 (514) 316 8966
> >> >>>> www.onehippo.com  •  www.onehippo.org  •  info at onehippo.com
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Hippo
> >> >>> Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31
> (0)20
> >> >>> 522 4466
> >> >>> Hippo USA Inc  • 755 Baywood Drive  • Second Floor  • Petaluma, CA
>> >> >>> 94954 USA  • Phone +1 (707) 658-4535
> >> >>> Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC
> H2T
> >> >>> 1S5  •  +1 (514) 316 8966
> >> >>> www.onehippo.com  •  www.onehippo.org  •  info at onehippo.com
> >> >>> _______________________________________________
> >> >>> Hippo-cms7-user mailing list and forums
> >> >>> http://www.onehippo.org/cms7/support/forums.html
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Hippo
> >> >> Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31
> (0)20
> >> >> 522 4466
> >> >> USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
> >> >> •  +1 (707) 773 4646
> >> >> Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC
> >> >> H2T 1S5  •  +1 (514) 316 8966
> >> >> www.onehippo.com  •  www.onehippo.org  •  info at onehippo.com
> >> >> _______________________________________________
> >> >> Hippo-cms7-user mailing list and forums
> >> >> http://www.onehippo.org/cms7/support/forums.html
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Hippo
> >> > Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20
> >> > 522 4466
> >> > Hippo USA Inc  • 755 Baywood Drive  • Second Floor  • Petaluma, CA  •
> >> > 94954 USA  • Phone +1 (707) 658-4535
> >> > Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
> >> > 1S5  •  +1 (514) 316 8966
> >> > www.onehippo.com  •  www.onehippo.org  •  info at onehippo.com
> >> > _______________________________________________
> >> > Hippo-cms7-user mailing list and forums
> >> > http://www.onehippo.org/cms7/support/forums.html
> >> >
> >>
> >>
> >>
> >> --
> >> Hippo
> >> Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20
> 522
> >> 4466
> >> USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
> >> •  +1 (707) 773 4646
> >> Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC
> >> H2T 1S5  •  +1 (514) 316 8966
> >> www.onehippo.com  •  www.onehippo.org  •  info at onehippo.com
> >
> >
> >
> > --
> > Hippo
> >
> ----------------------------------------------------------------------------------------------
> > Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20
> 522
> > 4466
> > USA  • 755 Baywood Drive Second Floor  •  Petaluma CA. 94954
> > •  +1 (707) 658-4535
> > Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC
> > H2T 1S5  •  +1 (514) 316 8966
> >
> ----------------------------------------------------------------------------------------------
> > www.onehippo.com  •  www.onehippo.org  •  info at onehippo.com
> >
> ----------------------------------------------------------------------------------------------
> >
> > _______________________________________________
> > Hippo-cms7-user mailing list and forums
> > http://www.onehippo.org/cms7/support/forums.html
> >
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>



-- 
Hippo
----------------------------------------------------------------------------------------------
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522
4466
USA  • 755 Baywood Drive Second Floor  •  Petaluma CA. 94954
•  +1 877 414-4776 (toll free)
Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC
H2T 1S5  •  +1 (514) 316 8966
----------------------------------------------------------------------------------------------
www.onehippo.com  •  www.onehippo.org  •  info at onehippo.com
----------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.onehippo.org/pipermail/hippo-cms7-user/attachments/20101203/ee5ee119/attachment.htm>


More information about the Hippo-cms7-user mailing list