Page 1 of 1

Reading multiple Child Nodes in XML stage

Posted: Thu Jan 19, 2012 9:06 am
by ashishm
One of our sources sends us XMLs.
I am facing an interesting issue how to parse multiple child nodes.

How do we deal with multiple child nodes in XML stage, where the nodes are repeating.
Since we don't know how many child nodes might appear, and all these nodes are identical , for e.g. if we consider multiple email addresses associated with a person. the XML will look like :

<Person>
<email>
XYZ@xyz.com
<email/>
<email>
abc@qwr.com
<email/>
<email>
pqr@qwr.com
<email/>
.
.
.
<Person/>

What i am currently doing is I am reading the individual xpaths in a transformer and then taking only 10 instances ahead.
Which is not a great way to do it.
But I really want to understand how such scenarios are handled.

Please let me know if anyone has a better way to do it.

Posted: Thu Jan 19, 2012 9:28 am
by eostic
Think of each repeating unit as though it was its own relational table....this one being "person+email" ....(maybe has 13 rows), and another might be "person+address" ...(maybe has 5 rows)....

Each is an individual set of rows having nothing to do with the other, except having the "person" in common.

With xmlInput Stage, you define multiple output links...one for each "unit" of repeating rows...then downstream you can decide what the best way is to relate them, as you would if they were a collection of flat files with the same relationship issue. If the repeats are fairly small (let's say that you have up to 5 phone numbers), a Pivot is often a good way to handle it, as that removes the multi-row issue and puts everything into independent columns.

With the new xml Stage, the join and relationship work can be done inside the Stage itself.

Ultimately though, it comes down to using techniques as you would with any many:many relationship among sources.

Ernie