Reading multiple Child Nodes in XML stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ashishm
Premium Member
Premium Member
Posts: 37
Joined: Thu Jun 16, 2011 8:12 am
Location: india

Reading multiple Child Nodes in XML stage

Post by ashishm »

One of our sources sends us XMLs.
I am facing an interesting issue how to parse multiple child nodes.

How do we deal with multiple child nodes in XML stage, where the nodes are repeating.
Since we don't know how many child nodes might appear, and all these nodes are identical , for e.g. if we consider multiple email addresses associated with a person. the XML will look like :

<Person>
<email>
XYZ@xyz.com
<email/>
<email>
abc@qwr.com
<email/>
<email>
pqr@qwr.com
<email/>
.
.
.
<Person/>

What i am currently doing is I am reading the individual xpaths in a transformer and then taking only 10 instances ahead.
Which is not a great way to do it.
But I really want to understand how such scenarios are handled.

Please let me know if anyone has a better way to do it.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Think of each repeating unit as though it was its own relational table....this one being "person+email" ....(maybe has 13 rows), and another might be "person+address" ...(maybe has 5 rows)....

Each is an individual set of rows having nothing to do with the other, except having the "person" in common.

With xmlInput Stage, you define multiple output links...one for each "unit" of repeating rows...then downstream you can decide what the best way is to relate them, as you would if they were a collection of flat files with the same relationship issue. If the repeats are fairly small (let's say that you have up to 5 phone numbers), a Pivot is often a good way to handle it, as that removes the multi-row issue and puts everything into independent columns.

With the new xml Stage, the join and relationship work can be done inside the Stage itself.

Ultimately though, it comes down to using techniques as you would with any many:many relationship among sources.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply