Page 1 of 1

How to process XML file with repetitive elements using the X

Posted: Thu Nov 28, 2013 12:50 pm
by Gopi Krishna
Hi All,

I have the following XML file and need to convert the XML data into reguler records.

<Emp>
<EmpName>XYZ</EmpName>
<Phonenumber>12345</phonenumber>
<phonenumber>56789</phonenumber>
<email>xyz@xyz.com</email>
<email>abc@abc.com</email>
</Emp>

All the data belongs to one single employee. no restriction on number of phone numbers and mail ids.I need to convert this data into as below.

EmpName,phonenumber,email
xyz,12345,xyz@xyz.com
xyz,12345,abc@abc.com
xyz,56789,xyz@xyz.com
xyz,56789,abc@abc.com

I have designed a job like XML Stage ------- > Seq File Stage. In XML stage I have used XML parser and imported the respective xsd file also. But as an output i am getting one single record like "xyz,12345,xyz@xyz.com". To get the required output i have used switch in the XML stage and i have done some kind of analysis also. But I did not get the required output. Can anybody suggest me how to achive this. What are all properties i need to use in XML satge? can we achieve this using XML stage?
Anybody have any doc please share with me.

Thanks !

Posted: Tue Dec 03, 2013 11:24 am
by nayanpatra
In the XML input stage, in the Transformation Settings, tick the box for repitition element required.

Posted: Tue Dec 03, 2013 7:39 pm
by eostic
Do you really want a cartesian product as you have outlined? Every phone times every email?

The stage might do that for you directly, using an hjoin step, but I'd prefer sending two output links from the xml stage, one for phones and one for emails and doing a cartesian product downstream......1here I can control it and use all of DataStage to affect its perfoprmance.

Ernie

Posted: Wed Dec 11, 2013 8:25 am
by Gopi Krishna
Hi Nayan,

Thanks for the response.

Even if you check that it will ask you for the key for repititon there you can see only one column for repetition but in my case I need to do repetition on more than one column.

Posted: Wed Dec 11, 2013 8:30 am
by Gopi Krishna
Hi Ernie,

Thanks for the response.

I dont have premium membership but i got what you are saying here. Basically you are suggesting to use HJOIN in XML stage but it will demand for two inputs as i know. My question is on what basis we need to split the single input xml file into two files and what stage again we need to use to split the xml file.

Thanks !

Posted: Wed Dec 11, 2013 9:12 am
by eostic
ETL tools typically perform a sort of "dynamic normalization" when reading thru a hierarchical document. This means that any repeating nodes will be sent out as a set of repeating "rows".

Phone numbers and emails in this document are entirely separate entities. There is no relationship between them except by who they belong to. They are no different than if you had two separate relational tables. They need to be treated as such --- you need an output link for the phone numbers and another one for the emails.

Downstream in DataStage, you can do whatever you need to pivot them, join them, etc.

Ernie

Posted: Wed Dec 11, 2013 11:24 am
by eostic
by the way... I'm specifically recommending that you NOT use HJoin or any other such step. Get one link working for your phone numbers and any other columns you want/need.... and then add another output link for the emails.

Then do whatever you need downstream.

Ernie