[Hippo-cms7-user] Journal question when clustering repositories

Jettro Coenradie jettro at jteam.nl
Tue Dec 15 08:33:59 CET 2009


Hi Bart,
sorry to keep coming back to this. But the following sentence from the referred page puzzles me a bit:

--- snip ---
The current solution has three known caveats:

If the janitor is enabled then you loose the possibility to easily add cluster nodes. (It is still possible but takes detailed knowledge of Jackrabbit.)
--- snip ---

This does not sound very comfortable to be. It seems to me that it is the janitor that we want, but this warning makes me think we don't want it.

What do you think?

On Dec 14, 2009, at 3:49 PM, Bart van der Schans wrote:

> On Mon, Dec 14, 2009 at 2:47 PM, Jettro Coenradie <jettro at jteam.nl> wrote:
>> Ik am reading this document
>> http://wiki.apache.org/jackrabbit/Clustering
>> Here it is stated that is not trivial to remove journal records.
> Well removing old journal records is trivial in a database (not in a
> file based journal). The trick is to determine which journal records
> are "old" (enough to delete).
> 
> Bart
> 
> 
> 
>> I am glad about the possibility to copy indexes, we will give this a go.
>> thanks
>> On Dec 14, 2009, at 2:08 PM, Bart van der Schans wrote:
>> 
>> Hi Jettro,
>> 
>> On Mon, Dec 14, 2009 at 1:50 PM, Jettro Coenradie <jettro at jteam.nl> wrote:
>> 
>> Hi All,
>> 
>> At the moment we are in the middle of deploying a large hippo installation.
>> We have 4 site servers, 2 cms servers with proxies and loadbalancers in
>> betweer (for the production only).
>> 
>> At the moment we are having issues with moving content from acceptation to
>> production. THe content in the database (mysql) is not a lot, maybe around
>> 3000 items with images and pdf's. At the moment the database dump is 3.5 Gb
>> big, of which the journal takes more than 2 Gb. There are not a lot of
>> revisions in there, so what will happen to this journal when we are working
>> with the system in production for more than a year?
>> 
>> You can just drop old journal records/revisions when they are consumed
>> by all the nodes in the cluster. So dropping records older than 24
>> hours or so should probably be fine. You probably have a daily backup
>> anyway. Creating a shell script to do so should be trivial. If I'm not
>> mistaken there's also some code inside jackrabbit to do pretty much
>> the same, but I haven't tried it yet.
>> 
>> 
>> Of course the dump itself is not the biggest problem, adding a server to the
>> cluster is. We now have assigned 6 Gb to a site with embedded repository
>> running, but starting the instance takes for ages and often they just don't
>> reach the modus that it can serve webpages at all.
>> 
>> Truncating the journal should fix this.
>> 
>> 
>> We are looking for more information on using clustering mode, one of the
>> questions we have has to deal with the lucene index. With a clean server,
>> the server needs to obtain all content from the cluster and create it's
>> local copy as well as the lucene index. Is it possible to copy files from
>> one server to another (the lucene files, or the other repository files)?
>> 
>> I'm not sure what you mean with "obtain all content from the cluster".
>> If the node has to create it's own index it has to index all the data
>> in the database which can take quite a while, especially if you have a
>> lot of pdf's.
>> 
>> To prevent re-indexing you can stop one node, copy it's index (the
>> whole 'repository' folder) and copy it to the new node on the same
>> location. Now you can start the node without having to re-index the
>> whole database. The nodes will though consume the new journal records
>> created during the downtime and update the index accordingly.
>> 
>> Regards,
>> Bart
>> 
>> 
>> 
>> Ideas and opinions are very welcome
>> 
>> regards Jettro Coenradie
>> 
>> _______________________________________________
>> 
>> Hippo-cms7-user mailing list and forums
>> 
>> http://www.onehippo.org/cms7/support/community.html
>> 
>> 
>> 
>> 
>> --
>> Hippo B.V.  -  Amsterdam
>> Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466
>> 
>> Hippo USA Inc.  -  San Francisco
>> 101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646
>> -----------------------------------------------------------------
>> http://www.onehippo.com   -  info at onehippo.com
>> -----------------------------------------------------------------
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/community.html
>> 
>> Jettro Coenradie - jettro at jteam.nl - http://www.jteam.nl - blog - linkedin
>> Phone: +31(0)20 486 20 36 Fax: +31(0)20 475 08 28 Mobile: +31(0)6 3473 9912
>> Frederiksplein 1 - 1017 XK - Amsterdam - The Netherlands
>> 
>> _______________________________________________
>> Hippo-cms7-user mailing list and forums
>> http://www.onehippo.org/cms7/support/community.html
>> 
> 
> 
> 
> -- 
> Hippo B.V.  -  Amsterdam
> Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466
> 
> Hippo USA Inc.  -  San Francisco
> 101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646
> -----------------------------------------------------------------
> http://www.onehippo.com   -  info at onehippo.com
> -----------------------------------------------------------------
> _______________________________________________
> Hippo-cms7-user mailing list and forums
> http://www.onehippo.org/cms7/support/community.html

Jettro Coenradie - jettro at jteam.nl - http://www.jteam.nl - blog - linkedin
Phone: +31(0)20 486 20 36 Fax: +31(0)20 475 08 28 Mobile: +31(0)6 3473 9912
Frederiksplein 1 - 1017 XK - Amsterdam - The Netherlands

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.onehippo.org/pipermail/hippo-cms7-user/attachments/20091215/97f98a70/attachment.htm>


More information about the Hippo-cms7-user mailing list