How to process XML file with repetitive elements using the X

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Gopi Krishna
Participant
Posts: 10
Joined: Thu Jan 06, 2011 1:02 am
Location: Ramantapur,HYD

How to process XML file with repetitive elements using the X

Post by Gopi Krishna »

Hi All,

I have the following XML file and need to convert the XML data into reguler records.

<Emp>
<EmpName>XYZ</EmpName>
<Phonenumber>12345</phonenumber>
<phonenumber>56789</phonenumber>
<email>xyz@xyz.com</email>
<email>abc@abc.com</email>
</Emp>

All the data belongs to one single employee. no restriction on number of phone numbers and mail ids.I need to convert this data into as below.

EmpName,phonenumber,email
xyz,12345,xyz@xyz.com
xyz,12345,abc@abc.com
xyz,56789,xyz@xyz.com
xyz,56789,abc@abc.com

I have designed a job like XML Stage ------- > Seq File Stage. In XML stage I have used XML parser and imported the respective xsd file also. But as an output i am getting one single record like "xyz,12345,xyz@xyz.com". To get the required output i have used switch in the XML stage and i have done some kind of analysis also. But I did not get the required output. Can anybody suggest me how to achive this. What are all properties i need to use in XML satge? can we achieve this using XML stage?
Anybody have any doc please share with me.

Thanks !
GK
nayanpatra
Participant
Posts: 41
Joined: Sat Jun 06, 2009 11:13 pm
Location: Kolkata

Post by nayanpatra »

In the XML input stage, in the Transformation Settings, tick the box for repitition element required.
Nayan
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Do you really want a cartesian product as you have outlined? Every phone times every email?

The stage might do that for you directly, using an hjoin step, but I'd prefer sending two output links from the xml stage, one for phones and one for emails and doing a cartesian product downstream......1here I can control it and use all of DataStage to affect its perfoprmance.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Gopi Krishna
Participant
Posts: 10
Joined: Thu Jan 06, 2011 1:02 am
Location: Ramantapur,HYD

Post by Gopi Krishna »

Hi Nayan,

Thanks for the response.

Even if you check that it will ask you for the key for repititon there you can see only one column for repetition but in my case I need to do repetition on more than one column.
GK
Gopi Krishna
Participant
Posts: 10
Joined: Thu Jan 06, 2011 1:02 am
Location: Ramantapur,HYD

Post by Gopi Krishna »

Hi Ernie,

Thanks for the response.

I dont have premium membership but i got what you are saying here. Basically you are suggesting to use HJOIN in XML stage but it will demand for two inputs as i know. My question is on what basis we need to split the single input xml file into two files and what stage again we need to use to split the xml file.

Thanks !
GK
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

ETL tools typically perform a sort of "dynamic normalization" when reading thru a hierarchical document. This means that any repeating nodes will be sent out as a set of repeating "rows".

Phone numbers and emails in this document are entirely separate entities. There is no relationship between them except by who they belong to. They are no different than if you had two separate relational tables. They need to be treated as such --- you need an output link for the phone numbers and another one for the emails.

Downstream in DataStage, you can do whatever you need to pivot them, join them, etc.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

by the way... I'm specifically recommending that you NOT use HJoin or any other such step. Get one link working for your phone numbers and any other columns you want/need.... and then add another output link for the emails.

Then do whatever you need downstream.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply