Build a Complex XML document
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 258
- Joined: Tue Jul 04, 2006 10:35 pm
- Location: Toronto
Build a Complex XML document
Hi ,
I want to build a complex xml document using datastage.
I also have gone through the XmlBestPractices.doc.
My job design is like given below.
file---->transformer------>xmlo/pDEP------------>join-----xmlo/pcombine
------->xmlo/pofficecode---->
I am extracting data from a file which has officecode and department.
data is given below
officecode DEP
01 TEC
01 FIN
01 FIN
01 FEC
I split officecode and DEP in a transformer.
build both teh xmls seperatly then i join them together and give it to another xml output stage.
i expect the output to be like
- <Details>
- <officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>
</Details>
but i get the output either as
- <Details>
- <officecode>
1
<DEP>TEC</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FEC</DEP>
</officecode>
</Details>
this doesnot make sense because , this output can be obtained by a single xml output stage.
or
- <Details>
<officecode>1 <DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP></officecode>
</Details>
here the problem is that the "<DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP>" part is coming as data in between <officecode> and </officecode>
I am missing out on something can somebody help me on this.
Regards,
Samyam
I want to build a complex xml document using datastage.
I also have gone through the XmlBestPractices.doc.
My job design is like given below.
file---->transformer------>xmlo/pDEP------------>join-----xmlo/pcombine
------->xmlo/pofficecode---->
I am extracting data from a file which has officecode and department.
data is given below
officecode DEP
01 TEC
01 FIN
01 FIN
01 FEC
I split officecode and DEP in a transformer.
build both teh xmls seperatly then i join them together and give it to another xml output stage.
i expect the output to be like
- <Details>
- <officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>
</Details>
but i get the output either as
- <Details>
- <officecode>
1
<DEP>TEC</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FEC</DEP>
</officecode>
</Details>
this doesnot make sense because , this output can be obtained by a single xml output stage.
or
- <Details>
<officecode>1 <DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP></officecode>
</Details>
here the problem is that the "<DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP>" part is coming as data in between <officecode> and </officecode>
I am missing out on something can somebody help me on this.
Regards,
Samyam
Please explain better what you are looking for as your output...from what I can tell, you don't need the complex technique....you only need that when you have "multiple office codes" with unrelated "multiple department codes".......in your case, all the dep codes appear to "belong" to the single department code....
Further, this snippet from early in your entry...
- <officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>
is identical to this snippet from later in your entry....
<officecode>1 <DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP></officecode>
Please describe in more detail what you are expecting, and also share the xpath (in the Descrption property on the input link of xmlOutput) for these two columns...that might help.
Ernie
Further, this snippet from early in your entry...
- <officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>
is identical to this snippet from later in your entry....
<officecode>1 <DEP> FEC </DEP> <DEP> FIN </DEP> <DEP> TEC </DEP> <DEP> FIN </DEP></officecode>
Please describe in more detail what you are expecting, and also share the xpath (in the Descrption property on the input link of xmlOutput) for these two columns...that might help.
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
-
- Premium Member
- Posts: 258
- Joined: Tue Jul 04, 2006 10:35 pm
- Location: Toronto
Hi Ernie,
The structure I want in the output is
<Details>
<Officecode>
<DEP>
</DEP>
</Officecode>
<Details>
Now I will give you the actual data.
01,FIN
03,SLS
01,FIN
01,TEC
02,TEC
01,FEC
02,HUR
03,MRT
One Office can have multiple DEP's and there are many offices.(as you can see in the sample data)
This data is read from a sequential file and goes to a transformer.
there are two output links for the transformer.
the first output link of the transformer has two columns
1.officecode
2.officecode_1
these two columns go to the xml output stage,
in teh input tab of the xml output stage
officecode ----------- has no xpath
officecode_1 ----------- /officecode/text()
in the output tab of the xml output stage
officecode ----------- has no xpath
officecode_1 ------------- /
this creates an xml chunk for officecode_1 and officecode propogates as it is.
Now the second link from the transforme has two columns
1.officecode
2.DEP
these two columns go to a second xml output stage.
input tab of xml output stage
officecode -------- has no xpath
DEP ----------- /DEP/text()
output tab of teh xml output stage
officecode ---------- has no xpath
DEP ----------- /
this creates DEP xml chunk
the outputs of the two xml stages go to a join stage.
where both the xml chunks are joined with the key officecode.
from the join stage the output goes to another xml output stage
which has the input xpath
officecode ------- /Details/officecode
DEP -------- /Details/officecode/DEP
***********************************************************
Now coming back to both the xml's being identical.
consider bold characters in the xml as data.
what i want is
<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>
but what i am getting is
<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>
I hope you understand what i am trying to say.
****************************************************
Let me know what i am doing wrong.
Regards,
Samyam
The structure I want in the output is
<Details>
<Officecode>
<DEP>
</DEP>
</Officecode>
<Details>
Now I will give you the actual data.
01,FIN
03,SLS
01,FIN
01,TEC
02,TEC
01,FEC
02,HUR
03,MRT
One Office can have multiple DEP's and there are many offices.(as you can see in the sample data)
This data is read from a sequential file and goes to a transformer.
there are two output links for the transformer.
the first output link of the transformer has two columns
1.officecode
2.officecode_1
these two columns go to the xml output stage,
in teh input tab of the xml output stage
officecode ----------- has no xpath
officecode_1 ----------- /officecode/text()
in the output tab of the xml output stage
officecode ----------- has no xpath
officecode_1 ------------- /
this creates an xml chunk for officecode_1 and officecode propogates as it is.
Now the second link from the transforme has two columns
1.officecode
2.DEP
these two columns go to a second xml output stage.
input tab of xml output stage
officecode -------- has no xpath
DEP ----------- /DEP/text()
output tab of teh xml output stage
officecode ---------- has no xpath
DEP ----------- /
this creates DEP xml chunk
the outputs of the two xml stages go to a join stage.
where both the xml chunks are joined with the key officecode.
from the join stage the output goes to another xml output stage
which has the input xpath
officecode ------- /Details/officecode
DEP -------- /Details/officecode/DEP
***********************************************************
Now coming back to both the xml's being identical.
consider bold characters in the xml as data.
what i want is
<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>
but what i am getting is
<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
</officecode>
I hope you understand what i am trying to say.
****************************************************
Let me know what i am doing wrong.
Regards,
Samyam
Have only one link, going into only one XMLOutput Stage. Sort your data before going in. You should have two columns on that single link to XMLOutput.
col1 /detail/officecode/text
col2 /detail/officecode/dep
Your xmloutput can be the terminating stage (for testing, this is fine...just put in a filename).
If that behave's strangely, it's possible that you "may" need to provide a formal element for the office code, along with a containing element for the repeating departments, which would be a much nicer design in XML anyway.....try this if the above isn't cooperating:
col1 /detail/officecode/officecode/text
col2 /detail/officecode/departments/department/text
If that works but has extra elements, we can talk about how to get rid of them after the fact.
Either way, you shouldn't need two instances and multiple links for this.
Ernie
col1 /detail/officecode/text
col2 /detail/officecode/dep
Your xmloutput can be the terminating stage (for testing, this is fine...just put in a filename).
If that behave's strangely, it's possible that you "may" need to provide a formal element for the office code, along with a containing element for the repeating departments, which would be a much nicer design in XML anyway.....try this if the above isn't cooperating:
col1 /detail/officecode/officecode/text
col2 /detail/officecode/departments/department/text
If that works but has extra elements, we can talk about how to get rid of them after the fact.
Either way, you shouldn't need two instances and multiple links for this.
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
-
- Participant
- Posts: 246
- Joined: Mon Jun 30, 2008 3:22 am
- Location: New York
- Contact:
The best way I know is that, you can create two sep XMLs, one for the office code and another for the Department.
In both the XMLs, create a column 'Office Code' in the tabular format ( not as XML). Then again, join both the XMLs using this column. And in your final XML O/P stage, Map only the required columns and DO remember to give a '/' in the desc of the fields, which are already XML in the final XML o/P stage and to enable Aggregate all rows option.
In both the XMLs, create a column 'Office Code' in the tabular format ( not as XML). Then again, join both the XMLs using this column. And in your final XML O/P stage, Map only the required columns and DO remember to give a '/' in the desc of the fields, which are already XML in the final XML o/P stage and to enable Aggregate all rows option.
Arun
-
- Premium Member
- Posts: 258
- Joined: Tue Jul 04, 2006 10:35 pm
- Location: Toronto
-
- Premium Member
- Posts: 258
- Joined: Tue Jul 04, 2006 10:35 pm
- Location: Toronto
-
- Premium Member
- Posts: 258
- Joined: Tue Jul 04, 2006 10:35 pm
- Location: Toronto
Hi,
The approach suggested by Ernie, using a single XMLOutput Stage
gives the output like this
- <Details>
- <officecode>
1
<DEP>TEC</DEP>
</officecode>
- <officecode>
1
<DEP>FEC</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
2
<DEP>HUR</DEP>
</officecode>
- <officecode>
2
<DEP>TEC</DEP>
</officecode>
- <officecode>
-</Details>
This is not a complex XML structure.
*********************************************************
Now comming to the second approach suggested by Arun,
giving '/' as the description for the two seperate XML chunks in
the final XMLOutput Stage does not create a XML document.
It gives a warning "Derivation rule "/" is invalid. Message = "Element or attribute expected at "/"""
***********************************************************
If you see my previous posting where i have mentioned
what i want is
<officecode>
1
<DEP> FEC </DEP>
<DEP> FIN </DEP>
<DEP> TEC </DEP>
<DEP> FIN </DEP>
</officecode>
but what i am getting is
<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP> </officecode>
The structure looks the same but I am getting
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
as data inside <officecode> </officecode>
if i open the xml in a note pad it looks like this
<OfficeCode>1
<DEP>
FEC
</DEP>
<DEP>
FIN
</DEP>
<DEP>
TEC
</DEP>
<DEP>
FIN
</DEP>
</OfficeCode>
I can replace < and > with < and > respectivly using sed command in unix and get the desired output but this is not a correct approach.
so please guide me.
The approach suggested by Ernie, using a single XMLOutput Stage
gives the output like this
- <Details>
- <officecode>
1
<DEP>TEC</DEP>
</officecode>
- <officecode>
1
<DEP>FEC</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
1
<DEP>FIN</DEP>
</officecode>
- <officecode>
2
<DEP>HUR</DEP>
</officecode>
- <officecode>
2
<DEP>TEC</DEP>
</officecode>
- <officecode>
-</Details>
This is not a complex XML structure.
*********************************************************
Now comming to the second approach suggested by Arun,
giving '/' as the description for the two seperate XML chunks in
the final XMLOutput Stage does not create a XML document.
It gives a warning "Derivation rule "/" is invalid. Message = "Element or attribute expected at "/"""
***********************************************************
If you see my previous posting where i have mentioned
what i want is
<officecode>
1
<DEP> FEC </DEP>
<DEP> FIN </DEP>
<DEP> TEC </DEP>
<DEP> FIN </DEP>
</officecode>
but what i am getting is
<officecode>
1
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP> </officecode>
The structure looks the same but I am getting
<DEP>FEC</DEP>
<DEP>FIN</DEP>
<DEP>TEC</DEP>
<DEP>FIN</DEP>
as data inside <officecode> </officecode>
if i open the xml in a note pad it looks like this
<OfficeCode>1
<DEP>
FEC
</DEP>
<DEP>
FIN
</DEP>
<DEP>
TEC
</DEP>
<DEP>
FIN
</DEP>
</OfficeCode>
I can replace < and > with < and > respectivly using sed command in unix and get the desired output but this is not a correct approach.
so please guide me.
-
- Premium Member
- Posts: 258
- Joined: Tue Jul 04, 2006 10:35 pm
- Location: Toronto
Which column is your repeating element? (with key = yes on the input link)? It should be your DEP column.
Ernie
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
I think this is what you are looking for ?
<?xml version="1.0" encoding="UTF-8" ?>
- <!-- Generated by Ascential Software Corporation, DataStage - XMLOutput stage - Mon Sep 13 18:11:42 2010
-->
- <Details>
- <officecode>
01
<DEP>FIN</DEP>
<DEP>FEC</DEP>
<DEP>TEC</DEP>
</officecode>
- <officecode>
02
<DEP>TEC</DEP>
<DEP>HUR</DEP>
</officecode>
- <officecode>
03
<DEP>MRT</DEP>
<DEP>SLS</DEP>
</officecode>
</Details>
Created from an input stream that looks like:
officecode,DEP
01,FIN
01,FEC
01,TEC
02,TEC
02,HUR
03,MRT
03,SLS
? Is that correct?
If so, contact me offline. I have, with one xmlOutput stage, and one simple path of links, a .dsx that does it.
Ernie
<?xml version="1.0" encoding="UTF-8" ?>
- <!-- Generated by Ascential Software Corporation, DataStage - XMLOutput stage - Mon Sep 13 18:11:42 2010
-->
- <Details>
- <officecode>
01
<DEP>FIN</DEP>
<DEP>FEC</DEP>
<DEP>TEC</DEP>
</officecode>
- <officecode>
02
<DEP>TEC</DEP>
<DEP>HUR</DEP>
</officecode>
- <officecode>
03
<DEP>MRT</DEP>
<DEP>SLS</DEP>
</officecode>
</Details>
Created from an input stream that looks like:
officecode,DEP
01,FIN
01,FEC
01,TEC
02,TEC
02,HUR
03,MRT
03,SLS
? Is that correct?
If so, contact me offline. I have, with one xmlOutput stage, and one simple path of links, a .dsx that does it.
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
-
- Premium Member
- Posts: 258
- Joined: Tue Jul 04, 2006 10:35 pm
- Location: Toronto
-
- Premium Member
- Posts: 258
- Joined: Tue Jul 04, 2006 10:35 pm
- Location: Toronto