Page 1 of 2

Build a Complex XML document

Posted: Wed Sep 08, 2010 11:38 pm
by samyamkrishna
Hi ,

I want to build a complex xml document using datastage.

I also have gone through the XmlBestPractices.doc.

My job design is like given below.


file---->transformer------>xmlo/pDEP------------>join-----xmlo/pcombine

------->xmlo/pofficecode---->

I am extracting data from a file which has officecode and department.
data is given below

officecode DEP
01 TEC
01 FIN
01 FIN
01 FEC

I split officecode and DEP in a transformer.
build both teh xmls seperatly then i join them together and give it to another xml output stage.

i expect the output to be like

- <Details>
- <officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>
</Details>

but i get the output either as

- <Details>
- <officecode>
1
<DEP>TEC</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FEC</DEP>
</officecode>
</Details>

this doesnot make sense because , this output can be obtained by a single xml output stage.

or

- <Details>
<officecode>1 <DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP></officecode>
</Details>

here the problem is that the "<DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP>" part is coming as data in between <officecode> and </officecode>

I am missing out on something can somebody help me on this.

Regards,
Samyam

Posted: Thu Sep 09, 2010 6:18 am
by eostic
Please explain better what you are looking for as your output...from what I can tell, you don't need the complex technique....you only need that when you have "multiple office codes" with unrelated "multiple department codes".......in your case, all the dep codes appear to "belong" to the single department code....

Further, this snippet from early in your entry...

- <officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>

is identical to this snippet from later in your entry....


<officecode>1 <DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP></officecode>


Please describe in more detail what you are expecting, and also share the xpath (in the Descrption property on the input link of xmlOutput) for these two columns...that might help.

Ernie

Posted: Thu Sep 09, 2010 7:25 am
by samyamkrishna
Hi Ernie,

The structure I want in the output is

<Details>
<Officecode>
<DEP>
</DEP>
</Officecode>
<Details>

Now I will give you the actual data.

01,FIN
03,SLS
01,FIN
01,TEC
02,TEC
01,FEC
02,HUR
03,MRT

One Office can have multiple DEP's and there are many offices.(as you can see in the sample data)

This data is read from a sequential file and goes to a transformer.
there are two output links for the transformer.

the first output link of the transformer has two columns
1.officecode
2.officecode_1

these two columns go to the xml output stage,
in teh input tab of the xml output stage

officecode ----------- has no xpath
officecode_1 ----------- /officecode/text()

in the output tab of the xml output stage

officecode ----------- has no xpath
officecode_1 ------------- /

this creates an xml chunk for officecode_1 and officecode propogates as it is.

Now the second link from the transforme has two columns
1.officecode
2.DEP

these two columns go to a second xml output stage.
input tab of xml output stage

officecode -------- has no xpath
DEP ----------- /DEP/text()

output tab of teh xml output stage

officecode ---------- has no xpath
DEP ----------- /

this creates DEP xml chunk

the outputs of the two xml stages go to a join stage.
where both the xml chunks are joined with the key officecode.

from the join stage the output goes to another xml output stage
which has the input xpath

officecode ------- /Details/officecode
DEP -------- /Details/officecode/DEP

***********************************************************

Now coming back to both the xml's being identical.
consider bold characters in the xml as data.

what i want is
<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>


but what i am getting is


<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>

</officecode>

I hope you understand what i am trying to say.

****************************************************

Let me know what i am doing wrong.
Regards,
Samyam

Posted: Thu Sep 09, 2010 10:19 am
by eostic
Have only one link, going into only one XMLOutput Stage. Sort your data before going in. You should have two columns on that single link to XMLOutput.

col1 /detail/officecode/text
col2 /detail/officecode/dep

Your xmloutput can be the terminating stage (for testing, this is fine...just put in a filename).

If that behave's strangely, it's possible that you "may" need to provide a formal element for the office code, along with a containing element for the repeating departments, which would be a much nicer design in XML anyway.....try this if the above isn't cooperating:

col1 /detail/officecode/officecode/text
col2 /detail/officecode/departments/department/text

If that works but has extra elements, we can talk about how to get rid of them after the fact.

Either way, you shouldn't need two instances and multiple links for this.

Ernie

Posted: Thu Sep 09, 2010 11:00 am
by arunkumarmm
The best way I know is that, you can create two sep XMLs, one for the office code and another for the Department.

In both the XMLs, create a column 'Office Code' in the tabular format ( not as XML). Then again, join both the XMLs using this column. And in your final XML O/P stage, Map only the required columns and DO remember to give a '/' in the desc of the fields, which are already XML in the final XML o/P stage and to enable Aggregate all rows option.

Posted: Thu Sep 09, 2010 9:32 pm
by samyamkrishna
Thanks a Ernie and Arun.
will try these approaches and get back to you on the results.

Posted: Mon Sep 13, 2010 4:22 am
by samyamkrishna
Hi Guys,

I tried all the approaches you guys suggested.
They dint work.

:cry:

is there any other documentation on bulding a complex xml document ?

Regards,
Samyam

Posted: Mon Sep 13, 2010 6:23 am
by chulett
No. Define "didn't work". Tell us what exactly you tried plus the output that you saw, then maybe someone can continue to try and help you.

Posted: Mon Sep 13, 2010 7:37 am
by samyamkrishna
Hi,

The approach suggested by Ernie, using a single XMLOutput Stage
gives the output like this

- <Details>
- <officecode>
1
<DEP>TEC</DEP>
</officecode>
- <officecode>
1
<DEP>FEC</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
2
<DEP>HUR</DEP>
</officecode>
- <officecode>
2
<DEP>TEC</DEP>
</officecode>
- <officecode>
-</Details>

This is not a complex XML structure.

*********************************************************

Now comming to the second approach suggested by Arun,
giving '/' as the description for the two seperate XML chunks in
the final XMLOutput Stage does not create a XML document.

It gives a warning "Derivation rule "/" is invalid. Message = "Element or attribute expected at "/"""


***********************************************************

If you see my previous posting where i have mentioned

what i want is
<officecode>
1
<DEP> FEC </DEP>
<DEP> FIN </DEP>
<DEP> TEC </DEP>
<DEP> FIN </DEP>
</officecode>


but what i am getting is


<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>

The structure looks the same but I am getting

<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>


as data inside <officecode> </officecode>

if i open the xml in a note pad it looks like this

<OfficeCode>1
<DEP>
FEC
</DEP>
<DEP>
FIN
</DEP>
<DEP>
TEC
</DEP>
<DEP>
FIN
</DEP>
</OfficeCode>

I can replace < and > with < and > respectivly using sed command in unix and get the desired output but this is not a correct approach.

so please guide me.

Posted: Mon Sep 13, 2010 9:35 am
by chulett
Rather than 'sed' change the 'Data Element' to XML so it knows it is already XML and will not convert those characters in your output.

Posted: Mon Sep 13, 2010 12:02 pm
by samyamkrishna
will try it tomorrow first thing

Posted: Mon Sep 13, 2010 4:44 pm
by eostic
Which column is your repeating element? (with key = yes on the input link)? It should be your DEP column.

Ernie

Posted: Mon Sep 13, 2010 7:15 pm
by eostic
I think this is what you are looking for ?

<?xml version="1.0" encoding="UTF-8" ?>
- <!-- Generated by Ascential Software Corporation, DataStage - XMLOutput stage - Mon Sep 13 18:11:42 2010
-->
- <Details>
- <officecode>
01
<DEP>FIN</DEP>
<DEP>FEC</DEP>
<DEP>TEC</DEP>
</officecode>
- <officecode>
02
<DEP>TEC</DEP>
<DEP>HUR</DEP>
</officecode>
- <officecode>
03
<DEP>MRT</DEP>
<DEP>SLS</DEP>
</officecode>
</Details>


Created from an input stream that looks like:

officecode,DEP
01,FIN
01,FEC
01,TEC
02,TEC
02,HUR
03,MRT
03,SLS

? Is that correct?


If so, contact me offline. I have, with one xmlOutput stage, and one simple path of links, a .dsx that does it.

Ernie

Posted: Mon Sep 13, 2010 9:48 pm
by samyamkrishna
Hi Ernie,

That is exactly what i want.

Posted: Mon Sep 13, 2010 11:42 pm
by samyamkrishna
Hi Craig,

It worked, i made the Dataelement = XML.
thanks a lot for the help.

Ernie,

I would still like to look at your job.
i want to see how do we get this structure with a single output stage.

Thanks,
Samyam