XML Stage with multiple lists at the same level

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ggarze
Premium Member
Premium Member
Posts: 78
Joined: Tue Oct 11, 2005 9:37 am

XML Stage with multiple lists at the same level

Post by ggarze »

I'm able to figure out how to read, adjust some data, and reconstruct in the same format an XML using the new XML stage which has a repeating element(list) of values for a parent by mapping the lowest level which is <style> to the link and working my way up. Example of XML below:

<promotions>
<promotion>
<name>00030</name>
<code>030</code>
<startDate>07-03-2012</startDate>
<endDate>30-11-2013</endDate>
<styles> <style>I3725M0026_2012_D9</style> <style>I3725M0045_2013_NN</style>
<style>I3725M0045_2013_01</style>
<tiers>
<tier>
<minQty>002</minQty>
<price>009.00</price>
</tier>
<tiers>
</promotion>
</promotions>

But, now my problem is what if the <tiers> which are at the same level of style also repeat. For example like this:

<promotions>
<promotion>
<name>00030</name>
<code>030</code>
<startDate>07-03-2012</startDate>
<endDate>30-11-2013</endDate>
<styles> <style>I3725M0026_2012_D9</style> <style>I3725M0045_2013_NN</style>
<style>I3725M0045_2013_01</style>
<tiers>
<tier>
<minQty>002</minQty>
<price>009.00</price>
</tier>
<tier>
<minQty>003</minQty>
<price>008.00</price>
</tier>
<tiers>
</promotion>
</promotions>

To read this do I have to have two links coming out of the XML stage, one for <style> and one for <tiers> and then combine them using a join later in the job or can I do this in one output link? I'm trying one link but if I choose <style> on my output link it doesn't allow me to map <tier> to any output and vice versa

Thanks
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Are they truly "independent" repeating nodes at the same level? If so, then in most cases, the appropriate solution is two separate links. They are no different than two separate relational tables coming out of (say) DB2. For a simple example, imagine "company" xml document that contains a repeating node for all of their employees, and another one for all of their assets. Assets might be machines in a factory, etc. and have nothing to do with each individual employee. Perhaps they are connected by a common company ID, but in the end, one might have 10,000 repeating values and the other only 220.

In that case, what does joining them mean? Do you want a cartesian product?

...it is possible, in some documents, that they are independent but related, although that is usually a poor design....where the "first occurence" of one node is known to be related directly to the "first occurrence" of the other. In those situations, you could consider doing joins inside of the Stage, depending on what type of identifier existed for relating the nodes. Even then, you are likely to have far more power and maintainability by sending the two links downstream and doing your work there to join or relate them together.

...in still other cases, you might have a tiny number of occurrences on one node, and simply desire to have them "lumped together" with the higher level keys and carried on each row of the other group. Pivoting and then joining might be a solution for you there. However...once again, unless there are really good reasons (such as reconstructing the xml into a new document --- where you are reading AND writing xml in the same stage instance), it is often better still to send the links out into DataStage and then do the custom joining/pivoting and construction there --- where you more options for parallelism, filtering, intermediate transformation, and simpler long term maintenance.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ggarze
Premium Member
Premium Member
Posts: 78
Joined: Tue Oct 11, 2005 9:37 am

Post by ggarze »

Hey Ernie,

Yeah it makes sense what you are saying the <style> and the <tier> really belong to the promotion itself so I can separate those two and have a separate link coming out of the XML stage for each. For any transformations I only have to do that for the <style> data and the <tier> data I just have to pass it along. So once I do that I have to bring the two links back together to create an XML in the same way for which it came when I started this thread. So basically I'm reading the xml, changing something on the <style> value and finally putting it back together. So, I am going to assume to recreate I'll have both the link with my <style> data and a link with my <tier> data, both with the promotion key fields on it going into another XMl stage.
If that is correct within the assembly then what transformtion steps am I looking at to accomplish putting this back together with multiple lists(<style> and <tier>) under the promotion. Regroup?, HJoin?, OrderJoin?, etc.

Thanks
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

regroup for certain, and then probably a join.....

Spend time on that "side" of the job separately, so that things don't get too confused...read the data from flat files and make sure you are able to write the xml perfectly with your assembly.

I'm currently writing a post on "tips for composing" for my blog, but in the meantime, make sure you've spent time with the redbook on the xml Stage.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ggarze
Premium Member
Premium Member
Posts: 78
Joined: Tue Oct 11, 2005 9:37 am

Post by ggarze »

Yes the redbook has been very helpful when I jsut had the one list <style> however, little more challenging with a second list that I can't seem to get.

I did a regroup for the <style> level data
I did another regroup for the <tier> level data
Then I added an OrderJoin

I thought I had it as the XML at least format was good but when I added another promotion to my input file the first promotion was gettign my second promotion <tier> data and my second promotion was getting my first <tier> level data

Was thinking instead of OrderJoin maybe use the HJoin but it doesn't make sens as that's for Parent/Children relationships and the <tier> is not a child to <style> as they are both children of promotion.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I agree it looks confusing, and frankly, I'm not sure why it is called Parent/Child in that Step....I've been using it (HJoin) exclusively for this purpose, and outside of some initial testing when the Stage first came out, I've never had a reason to look at the OrderJoin Step ever again. I suppose there is a good use case for it, but I haven't run across it yet.

Think of the HJoin as just a "regular join" where the "Parent" is simply the "first table" and the "Child" is the second table of information that you are joining.

Then think about your links as a "many" to "many" situation coming in, with common parent rows and unique child rows on each input link. Your goal is to reduce the first repeating set to a unique value, and then you have a more easily handled 1:many situation.

Perform a regroup on the data coming in on the first link (where the common higher level node information is the "key"), and then do your HJoin, selecting the Regroup "result" as your Parent list....and then your other input link as the Child.

Then when you reach the XML Composer Step, the first regroup "key information" gets mapped to your ultimate parent or common node, and the first regroup-ed sublist gets mapped to your initial repeating node --- and the HJoin result is mapped to the other "sibling" child node.

It is easy to get confused...and if you grab the wrong node for the mapping, you can get some very interesting results....so be patient and keep at it, and check your results carefully.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ggarze
Premium Member
Premium Member
Posts: 78
Joined: Tue Oct 11, 2005 9:37 am

Post by ggarze »

Worked!!! Thanks Ernie.

I did try the HJoin earlier but instead of just the one regroup I had one for each input link. Not sure if that was the issue or maybe something in my mapping in the composer step but I was getting the error [b]"XML_DEAL_OUTPUT,0: Message bundle error Can't find resource for bundle com.ibm.e2.Bundle_E2_engine_msgs_en_US, key E2IllegalStateException.parentCursorInvalid"[/b]

But following your steps with just the one regroup on one link then using that regroup for the keys(common fields) and first repeating list and the HJoin result as the second list worked.

Thanks Again!
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Congrats! ...and thanks...I continue to tweak the ways to describe this...glad it was clear.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply