Page 1 of 1

Fields with same name in XML

Posted: Wed Feb 17, 2010 12:27 pm
by krishna81
Is there any way that we can read read xml file where tagnames coming with same name.i am able to read first XML and populating fields except updatedate fileds.here i want to ppopulate if dateSequence=0 then updateDate should go to firstdate and if dateSequence=1 then second updateDate should go to seconddate(Issue is i am getting updateDate twice)

here is my requirement
I/p
<?xml version="1.0" encoding="UTF-8"?>
<XML>
<product>
<number>123</number>
<color><blue></color>
<location>A</location>
<Dates>
<dateSequence>0</dateSequence>
<updateDate>2010-01-10</updateDate>
<dateSequence>1</dateSequence>
<updateDate>2007-01-10</updateDate>
</Dates>
</product>
</XML>

out put must be:

number color location firstdate seconddate
1 blue A 2010-01-10 2007-01-10

My Design flow is
Extsourcestage--->Xml input-->Transformer---->seqfile
The logic i have used in Tx is
If Lnk_XMLi_Parse_xml_Tfm.dateSequence=0 Then Lnk_XMLi_Parse_Payload_Tfm.updateDate Else "1800-01-01" =firstdate;
If Lnk_XMLi_Parse_xml_Tfm.dateSequence=1 Then Lnk_XMLi_Parse_Payload_Tfm.updateDate Else "1800-01-01"=seconddate;)

But the output i am able to populate is

number color location firstdate seconddate
1 blue A 2010-01-10


Thanks
Kris

Posted: Wed Feb 17, 2010 2:55 pm
by eostic
It's a poorly designed xml document. Would have been nice if the second date was identified as such by its element name....or even better, if each "sequence" contained subelements (or subattributes) of "sequenceNumber" and "updateDate"..... Otherwise, what is represented here is just two "instances" (and thus two rows) of the same type of date [which as you note, is not really correct...they each have unique meaning]. Just because they are "in physical" order doesn't mean much when parsing xml.

Before thinking of a solution, I would ask if this is just a snippet of something larger...could there be three, four, or more of these date field element pairs in order...or are there always just two? ...and are there many many more columns than just these, or is this the finite list?

Ernie

Posted: Wed Feb 17, 2010 3:17 pm
by krishna81
This is the finest list and we are using same order.

Posted: Wed Feb 17, 2010 3:49 pm
by krishna81
Is there any way we can handle this situation in datastage.

Posted: Thu Feb 18, 2010 8:23 am
by eostic
Please answer the second paragraph in my note above. How much more complex is it? That will help dictate the best solution. This is easy to handle in DataStage, but will require some additional steps.

Ernie

Posted: Thu Feb 18, 2010 10:19 am
by krishna81
The data i have posted above is sample but date fields are always just two.

Posted: Thu Feb 18, 2010 1:47 pm
by eostic
ok...then, if it always just two, then I would just grab the whole "Dates" element in your xmlInput Stage output link (have a column called "DateElement" as a varchar 100 or other similar large length and change the xpath in the Description to be /.../.../Dates/ (without the final columns and text() ).

Remove the dateSequence and updateDate columns from the xmlInput Stage's output link.

In a downstream transformer, use whatever function you need to manually pull out the first and second dates appropriately, as they will now both be on the same "row". And if the sequence is always just 1 or 2, this will be solved by an easy "substring" function.

Ernie

Posted: Fri Feb 19, 2010 4:09 pm
by krishna81
Thanks.It worked.After 1st step i did substring logic in transformer.
I am going to mark this as resolved.