[Hippo-cms7-user] hippo 7 and xml

Ard Schrijvers a.schrijvers at onehippo.com
Tue Feb 10 12:08:21 CET 2009


Hello Sergei and Arje,

> 
> In this way, you have all the power of Xopus and especially 
> XML Schema - on the other hand, you don't have the XML 
> structure split out in a dedicated nodestructure in the 
> repository. If you want that, you might have to build some 
> extractors to do that for you. Hippo CMS 7 uses this concept 
> of nodetypes as described in the JCR spec [4] [5].

Basically Jackrabbit has a xml persistency manager, that splits out xml
into a node structure. Obviously, for example a 10 Mb xml file will
result in a very large node tree already, hence performance is not
optimal. 

OTOH, you can store xml as binary data and have it indexed with an xml
extractor, in other words, only lucene index the text to make it
searcheable. This obviously is blistering fast. 

So pluging in an xml schema, edit the xml content with xopus, and store
it again as a binary field is pretty straight forward. The only thing
that than still needs to be added it logical configurable meta data,
which can be used to search on. Arje refers to extractors for this, but
meta data indexing might be a more enlightning term for it: for example,
index the <title> field, <date> field, and so on, to be able to search
on it. 

Anyway, what I wanted to add to the discussion, is, and I didn't yet
dive into the feasibility of the idea, is that I would like to have the
xml as a node structure available, while still having the fast binary
data storage: Hippo Repository has added to Jackrabbit pluggable virtual
providers, in other words, on the fly created node structures, based on
meta data of the entire repository, filtered structures of the physical
content, the lucene term space (not yet build) etc etc. Note that all
these structures are exposed over jcr, so the client won't notice a
difference whether it is getting physical or virtual nodes. 

So, not yet sure whether we can easily accomplish it, but having a
virtual provider being able to deliver an xml binary file as a virtual
node structure, I think would be great. So, for example, if I would have
the xml file:

<doc>
	<title>foo</title>
	<body>bar</body>
</doc>  

And this xml file would be stored in the node mynode as a binary field
like:

-rootNode
	`-mynode
		- xmldata: stream
		- someprop 

And, if you would access this node structure through the virtual
provider, you would get:

-rootNode
	`-mynode
		`doc
		   |-title
		   `-body
			

And you could do: 

Node n = rootNode.getNode("mynode/doc/title")

This, way, I think we could deliver fast xml storage, and at the same
time being able to deliver it as a node structure. 

Only querying within the virtual node structure would not be possible,
and would need some predefined indexing, which Arje is referring to with
extractors,

Regards Ard

> 
> 



More information about the Hippo-cms7-user mailing list