[Hippo-cms7-user] Searching; handling characters like "-" (in resources)

Jeroen Reijn j.reijn at onehippo.com
Tue Dec 14 20:53:19 CET 2010


Ard,

On Tue, Dec 14, 2010 at 5:40 PM, Ard Schrijvers
<a.schrijvers at onehippo.com> wrote:
> Hello,
>
> Because there can be many reasons why some word can't or can be found
> as a sentence, you need to describe the issue as clear as possible. I
> did do the following tests:
>
> 1) Create pdf with Foo-Bar, one with 'Foo-Bar', one with 'Set-Cookie'
> one with Set-Cookie : I can find the pdfs with all your mentioned
> XPath queries. So, that is not the issue
> 2) It could be a Stemming issue: I learned your project does not have
> a language analyser, so that is not the issue either
> 3) I tested with another words from your pdf with a dash... it works,
> as long as the word was in the beginning of the pdf.
>
> Aha, that is a big pointer of course.
>
> Default repository.xml says:
>
> - maxFieldLength: the number of words that are fulltext indexed at
> most per property.
> <param name="maxFieldLength" value="10000"/>
>
> So, if you really want to index *everything*, make sure you have something like:
>
> <param name="maxFieldLength" value="100000000"/>

Do I understand correctly that not all content is stored in the lucene
index if this setting is not changed?
Is this the JR (sensible) default or how should I see this? If you
index pdf documents with your documents, you should increase the value
of this parameter? I guess it can affect scoring right, because not
all content is index.

>
> What you should do:
>
> 1) Modify your repository.xml this value
> 2) For existing repository:
>     0) Stop repository
>     a) Delete the indexes
>     b) Change the workspace.xml this value as well (the workspace.xml
> is created from repository.xml)
>     c) Restart repository
>     --> indexing will be done. You can search now on Set-Cookie
>
> Regards Ard
>
> On Tue, Dec 14, 2010 at 4:39 PM, Mickaël Tricot <m.tricot at onehippo.com> wrote:
>> Here is an extract of the words the concerned PDF file: via een ‘Set-Cookie’
>> header
>>
>> On 12/14/2010 04:27 PM, Ard Schrijvers wrote:
>>
>> //*[jcr:contains(., 'Foo-Bar')] should just work...I would have to
>> sort this out. Do you have the pdf with the text for me or is it
>> private? Also, are you *really* using Foo-Bar, or is this an example?
>> If it happens to be something like 'hoog-slaper' and you are using a
>> Dutch analyzer, it might give me some more ideas. Do you have some
>> analyzer configured or the default?
>>
>> Regards Ard
>>
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/forums.html
>>
>
>
>
> --
> Hippo
> Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
> USA  • San Francisco 755 Baywood Drive, Second Floor •  Petaluma, CA.
> 94954 •  +1 877 414 4776 (toll free)
> Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC
> H2T 1S5  •  +1 (514) 316 8966
> www.onehippo.com  •  www.onehippo.org  •  info at onehippo.com
> ________________________________________________________________
> This e-mail may be privileged and/or confidential, and the sender does
> not waive any related rights and obligations. Any distribution, use or
> copying of this e-mail or the information it contains by other than an
> intended recipient is unauthorized. If you received this e-mail in
> error, please advise me (by return e-mail or otherwise) immediately.
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/forums.html
>



-- 
Hippo
----------------------------------------------------------------------------------------------
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • 755 Baywood Drive Second Floor  •  Petaluma CA. 94954
•  +1 877 414-4776 (toll free)
Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC
H2T 1S5  •  +1 (514) 316 8966
----------------------------------------------------------------------------------------------
www.onehippo.com  •  www.onehippo.org  •  info at onehippo.com
----------------------------------------------------------------------------------------------



More information about the Hippo-cms7-user mailing list