XML : Merging repeating attributes into one row

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mathewb
Premium Member
Premium Member
Posts: 22
Joined: Tue Jul 17, 2007 10:35 pm

XML : Merging repeating attributes into one row

Post by mathewb »

The Problem:

I have an input XML in the below format:
<ParXML>
<Batch>
<PR Rpt="101" Dt="2008-01-10">
<Loc ID="00005" Typ="A" />
<Loc ID="40536" Typ="B" />
<Loc ID="00105" Typ="C">
<Subloc ID="6300" Typ="5" />
</Loc>
<ST INDT="LEH" SubTyp="111" />
<Qty Typ="DLT" Amt="100" />
</PR>
<PR Rpt="102" Dt="2008-01-10" >
<Loc ID="00094" Typ="A" />
<Loc ID="000125860" Typ="B" />
<Loc ID="123456789" Typ="C">
<Subloc ID="2250" Typ="5" />
</Loc>
<ST INDT="DND" SubTyp="111" />
<Qty Typ="DLT" Amt="200" />
</PR>
</Batch>
</ParXML>


Output needed

Record Description:
Rpt|Dt|TypAID|TypBID|TypeCID|SublocID|INDT|Typ|Amt

Expected File Output:
101|2008-01-10|00005|40536|00105|6300|LEH|DLT|100
102|2008-01-10|00094|000125860|123456789|2250|DND|DLT|200

But i am getting an output as:
101|2008-01-10|00005||||LEH|DLT|100
101|2008-01-10||40536|||LEH|DLT|100
101|2008-01-10|||00105|6300|LEH|DLT|100
102|2008-01-10|00094||||DND|DLT|200
102|2008-01-10||000125860|||DND|DLT|200
102|2008-01-10|||123456789|2250|DND|DLT|200

I want all the 3 "ID"s to go into TypAID, TypeBID, TypeCID columns.
I marked ID as the repeating element and hence I am getting 3 rows for each PR.
How can i achieve the expected file output? If i mark PR as the repeating element i get one
row for each PR attribute, but the TypBID,TypeCID and SublocID fields are empty

Please let me know how to resolve this
Mathew
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

The problem is that you have a value assigned to LocID, not to the individual attributes.

In order to collect everything in one place, you need to either rename the individual fields, OR use a merge/lookup/join stage using the key of Rpt attribute.

Both solutions are not very "neat", but this is how the XML stage works when you flatten the data.

I hope Ernie may have a better idea.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Another classic case of what I would consider disappointing xml design. The Type Id's appear to be unique enough to justify their own element names, as though column names, with the value of Typ being the value of the element. Alas, we're given multiple "instances" of one element instead of three nicely identified and unique elements. If ID is truly dynamic and not some static part of the model, then this would make sense, but I wonder when I look at the output you need, where they are indeed three separate columns.

There may be a way to do it by messing with the xpath, but I've found it to be finicky when used with all attributes like this, and I'm also not keen on putting filtering tranformation code, unless absolutely necessary, in a Description property. This one is also complicated by Subloc...does it only exist for that final ID value, or could it possibly exist for the other Loc elements also?

From a structual perspective, Subloc is your proper repeating element, and you should get 3 rows, provided you uncheck "repetition element required". Could it in theory, repeat many times? In that case, the flat structure you are looking for is incorrect.

Teej is right. Pop em out of the XMLStage and then use your favorite pivot technique or other mechanism to bring things together. Imagine that the Loc/Sub loc combination is a parent-child relationship and giving you 3 parents, one of which has a child (in this instance). If ST and Qty repeat inside of /Batch, then you have even more tables to deal with and will have to merge them in addition to any desired pivots.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply