Build a Complex XML document

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Build a Complex XML document

Post by samyamkrishna »

Hi ,

I want to build a complex xml document using datastage.

I also have gone through the XmlBestPractices.doc.

My job design is like given below.


file---->transformer------>xmlo/pDEP------------>join-----xmlo/pcombine

------->xmlo/pofficecode---->

I am extracting data from a file which has officecode and department.
data is given below

officecode DEP
01 TEC
01 FIN
01 FIN
01 FEC

I split officecode and DEP in a transformer.
build both teh xmls seperatly then i join them together and give it to another xml output stage.

i expect the output to be like

- <Details>
- <officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>
</Details>

but i get the output either as

- <Details>
- <officecode>
1
<DEP>TEC</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FEC</DEP>
</officecode>
</Details>

this doesnot make sense because , this output can be obtained by a single xml output stage.

or

- <Details>
<officecode>1 <DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP></officecode>
</Details>

here the problem is that the "<DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP>" part is coming as data in between <officecode> and </officecode>

I am missing out on something can somebody help me on this.

Regards,
Samyam
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Please explain better what you are looking for as your output...from what I can tell, you don't need the complex technique....you only need that when you have "multiple office codes" with unrelated "multiple department codes".......in your case, all the dep codes appear to "belong" to the single department code....

Further, this snippet from early in your entry...

- <officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>

is identical to this snippet from later in your entry....


<officecode>1 <DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP></officecode>


Please describe in more detail what you are expecting, and also share the xpath (in the Descrption property on the input link of xmlOutput) for these two columns...that might help.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

Hi Ernie,

The structure I want in the output is

<Details>
<Officecode>
<DEP>
</DEP>
</Officecode>
<Details>

Now I will give you the actual data.

01,FIN
03,SLS
01,FIN
01,TEC
02,TEC
01,FEC
02,HUR
03,MRT

One Office can have multiple DEP's and there are many offices.(as you can see in the sample data)

This data is read from a sequential file and goes to a transformer.
there are two output links for the transformer.

the first output link of the transformer has two columns
1.officecode
2.officecode_1

these two columns go to the xml output stage,
in teh input tab of the xml output stage

officecode ----------- has no xpath
officecode_1 ----------- /officecode/text()

in the output tab of the xml output stage

officecode ----------- has no xpath
officecode_1 ------------- /

this creates an xml chunk for officecode_1 and officecode propogates as it is.

Now the second link from the transforme has two columns
1.officecode
2.DEP

these two columns go to a second xml output stage.
input tab of xml output stage

officecode -------- has no xpath
DEP ----------- /DEP/text()

output tab of teh xml output stage

officecode ---------- has no xpath
DEP ----------- /

this creates DEP xml chunk

the outputs of the two xml stages go to a join stage.
where both the xml chunks are joined with the key officecode.

from the join stage the output goes to another xml output stage
which has the input xpath

officecode ------- /Details/officecode
DEP -------- /Details/officecode/DEP

***********************************************************

Now coming back to both the xml's being identical.
consider bold characters in the xml as data.

what i want is
<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>


but what i am getting is


<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>

</officecode>

I hope you understand what i am trying to say.

****************************************************

Let me know what i am doing wrong.
Regards,
Samyam
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Have only one link, going into only one XMLOutput Stage. Sort your data before going in. You should have two columns on that single link to XMLOutput.

col1 /detail/officecode/text
col2 /detail/officecode/dep

Your xmloutput can be the terminating stage (for testing, this is fine...just put in a filename).

If that behave's strangely, it's possible that you "may" need to provide a formal element for the office code, along with a containing element for the repeating departments, which would be a much nicer design in XML anyway.....try this if the above isn't cooperating:

col1 /detail/officecode/officecode/text
col2 /detail/officecode/departments/department/text

If that works but has extra elements, we can talk about how to get rid of them after the fact.

Either way, you shouldn't need two instances and multiple links for this.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
arunkumarmm
Participant
Posts: 246
Joined: Mon Jun 30, 2008 3:22 am
Location: New York
Contact:

Post by arunkumarmm »

The best way I know is that, you can create two sep XMLs, one for the office code and another for the Department.

In both the XMLs, create a column 'Office Code' in the tabular format ( not as XML). Then again, join both the XMLs using this column. And in your final XML O/P stage, Map only the required columns and DO remember to give a '/' in the desc of the fields, which are already XML in the final XML o/P stage and to enable Aggregate all rows option.
Arun
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

Thanks a Ernie and Arun.
will try these approaches and get back to you on the results.
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

Hi Guys,

I tried all the approaches you guys suggested.
They dint work.

:cry:

is there any other documentation on bulding a complex xml document ?

Regards,
Samyam
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No. Define "didn't work". Tell us what exactly you tried plus the output that you saw, then maybe someone can continue to try and help you.
-craig

"You can never have too many knives" -- Logan Nine Fingers
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

Hi,

The approach suggested by Ernie, using a single XMLOutput Stage
gives the output like this

- <Details>
- <officecode>
1
<DEP>TEC</DEP>
</officecode>
- <officecode>
1
<DEP>FEC</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
2
<DEP>HUR</DEP>
</officecode>
- <officecode>
2
<DEP>TEC</DEP>
</officecode>
- <officecode>
-</Details>

This is not a complex XML structure.

*********************************************************

Now comming to the second approach suggested by Arun,
giving '/' as the description for the two seperate XML chunks in
the final XMLOutput Stage does not create a XML document.

It gives a warning "Derivation rule "/" is invalid. Message = "Element or attribute expected at "/"""


***********************************************************

If you see my previous posting where i have mentioned

what i want is
<officecode>
1
<DEP> FEC </DEP>
<DEP> FIN </DEP>
<DEP> TEC </DEP>
<DEP> FIN </DEP>
</officecode>


but what i am getting is


<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>

The structure looks the same but I am getting

<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>


as data inside <officecode> </officecode>

if i open the xml in a note pad it looks like this

<OfficeCode>1
<DEP>
FEC
</DEP>
<DEP>
FIN
</DEP>
<DEP>
TEC
</DEP>
<DEP>
FIN
</DEP>
</OfficeCode>

I can replace < and > with < and > respectivly using sed command in unix and get the desired output but this is not a correct approach.

so please guide me.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Rather than 'sed' change the 'Data Element' to XML so it knows it is already XML and will not convert those characters in your output.
-craig

"You can never have too many knives" -- Logan Nine Fingers
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

will try it tomorrow first thing
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Which column is your repeating element? (with key = yes on the input link)? It should be your DEP column.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I think this is what you are looking for ?

<?xml version="1.0" encoding="UTF-8" ?>
- <!-- Generated by Ascential Software Corporation, DataStage - XMLOutput stage - Mon Sep 13 18:11:42 2010
-->
- <Details>
- <officecode>
01
<DEP>FIN</DEP>
<DEP>FEC</DEP>
<DEP>TEC</DEP>
</officecode>
- <officecode>
02
<DEP>TEC</DEP>
<DEP>HUR</DEP>
</officecode>
- <officecode>
03
<DEP>MRT</DEP>
<DEP>SLS</DEP>
</officecode>
</Details>


Created from an input stream that looks like:

officecode,DEP
01,FIN
01,FEC
01,TEC
02,TEC
02,HUR
03,MRT
03,SLS

? Is that correct?


If so, contact me offline. I have, with one xmlOutput stage, and one simple path of links, a .dsx that does it.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

Hi Ernie,

That is exactly what i want.
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

Hi Craig,

It worked, i made the Dataelement = XML.
thanks a lot for the help.

Ernie,

I would still like to look at your job.
i want to see how do we get this structure with a single output stage.

Thanks,
Samyam
Post Reply