[Hippo-cms7-user] Searching; handling characters like "-" (in resources)

Ard Schrijvers a.schrijvers at onehippo.com
Tue Dec 14 21:30:41 CET 2010


On Tue, Dec 14, 2010 at 8:53 PM, Jeroen Reijn <j.reijn at onehippo.com> wrote:

>> <param name="maxFieldLength" value="100000000"/>
>
> Do I understand correctly that not all content is stored in the lucene
> index if this setting is not changed?

To be precies, tt is not about storing but about indexing...not all
content is indexed. This is not really bad in general, see below

> Is this the JR (sensible) default or how should I see this? If you
> index pdf documents with your documents, you should increase the value
> of this parameter? I guess it can affect scoring right, because not
> all content is index.

I wouldn't concern about scoring to much for this. It is questionable
how much words after, say, the 10.000th word in a document adds. Most
likely, in 99%, it does not add that much new. It is for a reason
no-one ever noticed the current limitation of 10.000. It cannot harm
if you increase it to say a million.

Also see for the Solr equivalent which has the same default and the
same maxFieldLength name for it:

Regards Ard

http://wiki.apache.org/solr/SolrConfigXml
http://lucene.472066.n3.nabble.com/maxFieldLength-td490843.html


>



More information about the Hippo-cms7-user mailing list