XML Output collects multiple records into single parent node

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

XML Output collects multiple records into single parent node

Post by sjordery »

Hi All,
I have a server job that generates an XML output from a database table based on an XSD.
I have 50 records in the test table and all of them have unique Keys.
In one particular instance i have 3 records that have the same data for all the 20 columns.
The output of 97 records look fine
<Rec1>
<KeyElem>Key1</KeyElem>
<Elem1>Value1</Elem1>
<Elem2>Value2</Elem2>
....
....
<Elem20>Value3</Elem20>
</Rec1>
<Rec2>
<KeyElem>Key2</KeyElem>
<Elem1>Value1</Elem1>
<Elem2>Value2</Elem2>
....
....
<Elem20>Value3</Elem20>
</Rec2>

For some strange reason, i see the XML output lumped as follows for the 3 records that have identical values for non-key columns .

<Rec98>
<KeyElem>Key98</KeyElem>
<KeyElem>Key99</KeyElem>
<KeyElem>Key100</KeyElem>
<Elem1>Value1</Elem1>
<Elem2>Value2</Elem2>
....
....
<Elem20>Value3</Elem20>
</Rec98>

The schema definition for <KeyElem> is left to take the default constraints (As per W3.org minOccurs=1 and maxOccurs=1, if no constraint is specified)

Would appreciate your invaluable inputs on what i might be missing here.

Thanks in advance,
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

I would suggest that you DON'T leave the constraints as default. Can you modify your XSD so that the constraints are set to what you actually need.
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Have you checked Repetitive Element?
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

Post by sjordery »

Colin and Josh, thanks for your time and input.
Strange but true, I could not figure out what was making it to aggregate all similar records into one but I started a new job from scratch and went through the stages in steps and could not recreate the problem.

As for the XSD and multi-occurence constraints, the XSD already enforces those rules. This has been tested in the current .NET application and using various validating parses/tools. Something strange with the way the initial job was configured, I presume. I will try to recreate that scenario and post the results to the group.

Thanks a lot for your time
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Hi,
I tried a simple test "server" job using a SQL Server table and an XSD file. I do see this issue in the output file. I will provide the datbase and data here for reference.

Table DDL
CREATE TABLE [dbo].[EmpMaster](
[EmpId] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Name] [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Street] [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[City] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Zip] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Phone] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

Load Data
INSERT INTO [dbo].[EmpMaster] VALUES('Emp001', 'DS Name 1', 'DS Street 1', 'DS City 1' ,'12345', '123-456-7891')
INSERT INTO [dbo].[EmpMaster] VALUES('Emp002', 'DS Name 2', 'DS Street 2', 'DS City 2' ,'12345', '123-456-7892')
INSERT INTO [dbo].[EmpMaster] VALUES('Emp003', 'DS Name 3', 'DS Street 3', 'DS City 3' ,'12345', '123-456-7893')
INSERT INTO [dbo].[EmpMaster] VALUES('Emp004', 'DS Name 3', 'DS Street 3', 'DS City 3' ,'12345', '123-456-7893')

XML Schema
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://myschema.defaults.com/EmpMaster/2007-22-05/" targetNamespace="http://myschema.defaults.com/EmpMaster/2007-22-05/" elementFormDefault="qualified" version="1.0">
<xs:element name="EmployeeMaster">
<xs:complexType>
<xs:sequence>
<xs:element name="Employee" type="Employee" nillable="false" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="Employee">
<xs:sequence>
<xs:element name="EmpId" type="xs:string" minOccurs="1" maxOccurs="1" nillable="false" />
<xs:element name="Name" type="xs:string" minOccurs="1" maxOccurs="1" nillable="false" />
<xs:element name="Street" type="xs:string" minOccurs="1" maxOccurs="1" nillable="false" />
<xs:element name="City" type="xs:string" minOccurs="1" maxOccurs="1" nillable="false" />
<xs:element name="Zip" type="xs:string" minOccurs="1" maxOccurs="1" nillable="false" />
<xs:element name="Phone" type="xs:string" minOccurs="1" maxOccurs="1" nillable="false" />
</xs:sequence>
</xs:complexType>
</xs:schema>

Output
<?xml version="1.0" encoding="UTF-8" ?>
<EmployeeMaster xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns="http://myschema.defaults.com/EmpMaster/2007-19-03/" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Employee>
<EmpId>Emp001</EmpId>
<Name>DS Name 1</Name>
<Street>DS Street 1</Street>
<City>DS City 1</City>
<Zip>12345</Zip>
<Phone>123-456-7891</Phone>
</Employee>
<Employee>
<EmpId>Emp002</EmpId>
<Name>DS Name 2</Name>
<Street>DS Street 2</Street>
<City>DS City 2</City>
<Zip>12345</Zip>
<Phone>123-456-7892</Phone>
</Employee>
- <Employee>
<EmpId>Emp003</EmpId>
<EmpId>Emp004</EmpId>

<Name>DS Name 3</Name>
<Street>DS Street 3</Street>
<City>DS City 3</City>
<Zip>12345</Zip>
<Phone>123-456-7893</Phone>
</Employee>
</EmployeeMaster>

The "output mode" in the transformation settings tab is set to "Aggregate All Rows", to write all into one file.

Ernie/Ray/Craig - Gurus,
Is it a good practice to write to files from XML output stage or should it be routed to a Seq file? Wondering if the "Output Mode" setting is causing it to behave in this manner.

Thanks,
-V
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Hi,
On second thought, I might not have understood the workings of the "Output Mode" in the "Transformation settings" page in the XML Output stage.
If i select "Aggregate All Rows", i get the grouping of similar records, even though the key column has different values.
If I select "Single Row", each record is written out separately, but the only downside is that it generates multiple output files, one for each record.
If I select "Use Trigger column", it is the same effect as "Single Row", because all records have unique key and therefore will get triggered for every record.
I tried Option-2 "Single Row" and followed the suggestion in viewtopic.php?t=109151&start=15 and also the best practices "customers.dsx" sample.
I am stuck at the stage where i have to combine the chunks into a master document. I was succesful to the point where i get data from the table (s) and transform them into individual XML chunks. The difference, however, is that in my case, i have to "Generate Chunks" and use "Single Row" output mode, which generates the output document upto the individual records as given below
<Employee>
<EmpId>Emp001</EmpId>
<Name>DS Name 1</Name>
<Street>DS Street 1</Street>
<City>DS City 1</City>
<Zip>12345</Zip>
<Phone>123-456-7891</Phone>
</Employee>
<Employee>
<EmpId>Emp002</EmpId>
<Name>DS Name 2</Name>
<Street>DS Street 2</Street>
<City>DS City 2</City>
<Zip>12345</Zip>
<Phone>123-456-7892</Phone>
</Employee>

When i introduce the final XML output stage, it is writing the header to each xml record instead of the outermost level. I am afraid, i am lost at this point... :( :(
So instead of
<EmployeeMaster xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns="http://myschema.defaults.com/EmpMaster/2007-19-03/" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Employee>
<EmpId>Emp001</EmpId>
<Name>DS Name 1</Name>
<Street>DS Street 1</Street>
<City>DS City 1</City>
<Zip>12345</Zip>
<Phone>123-456-7891</Phone>
</Employee>
<Employee>
<EmpId>Emp002</EmpId>
<Name>DS Name 2</Name>
<Street>DS Street 2</Street>
<City>DS City 2</City>
<Zip>12345</Zip>
<Phone>123-456-7892</Phone>
</Employee>
</EmployeeMaster>

I am getting
<EmployeeMaster xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns="http://myschema.defaults.com/EmpMaster/2007-19-03/" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Employee>
<EmpId>Emp001</EmpId>
<Name>DS Name 1</Name>
<Street>DS Street 1</Street>
<City>DS City 1</City>
<Zip>12345</Zip>
<Phone>123-456-7891</Phone>
</Employee>
</EmployeeMaster>
<EmployeeMaster xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns="http://myschema.defaults.com/EmpMaster/2007-19-03/" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Employee>
<EmpId>Emp002</EmpId>
<Name>DS Name 2</Name>
<Street>DS Street 2</Street>
<City>DS City 2</City>
<Zip>12345</Zip>
<Phone>123-456-7892</Phone>
</Employee>
</EmployeeMaster>

Is there a way to write the output to a single file when the "Output Mode" is set to "Single Row"?

Here is my job structure.
ODBC_Get_DB (Read from Database)==> XFM_Employee (XForm Employee XML record) ==> XMLOut_Employee (Create Employee XML) ==> XFM_EmployeeMaster(Add Header for EmployeeMaster) ==> XMLOut_EmployeeMaster (Write ALL XML Output)

Any fresh ideas to throw at..... would be welcome...

Let me know if you would like any additional details.

Thanks,
-V
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

VCInDSX wrote:Is there a way to write the output to a single file when the "Output Mode" is set to "Single Row"?
No. You need to 'Aggregate all rows' to get one output file. Or perhaps direct the output through a Sequential File stage, but I haven't really done that much and let the XML Output stage do the dirty deed for me.

I'm pretty sure all you have to do to stop the 'grouping' is not mark one of the fields as the Repetition Element - don't mark a Key field in other words.
-craig

"You can never have too many knives" -- Logan Nine Fingers
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

chulett wrote: I'm pretty sure all you have to do to stop the 'grouping' is not mark one of the fields as the Repetition Element - don't mark a Key field in other words.
Hi Craig,
Thanks for taking the time to review and your feedback. Appreciate it much. I was away from work due to the "long" weekend :( , hence the delay. Apologies....

If i enable "Aggregate All rows", i always get this "combining" effect.

I decided to try your suggestion with a simpler implementation. I created a simple test job with the following structure. This is almost my original job.
ReadFromDB ==> XForm2XML ==> Write2XMLFile
I checked my stage properties and updated them to ensure that none of them have any of the columns marked as "Key".
I also setup the output to "Aggregate All Rows". The same results....

Additionally, I tested one other scenario wherein i introduced another record at the end of this table that resembled the first record, except the primary column.
INSERT INTO [dbo].[EmpMaster] VALUES('Emp005', 'DS Name 1', 'DS Street 1', 'DS City 1' ,'12345', '123-456-7891')
When I ran this set using the job, the XML Output stage aggregated the consecutive records 3 and 4 but not records 1 and 5 :o :o
Here is the output
<?xml version="1.0" encoding="UTF-8"?>
<EmployeeMaster xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns="http://myschema.defaults.com/EmpMaster/2007-22-05/" xsi:schemaLocation="http://myschema.defaults.com/EmpMaster/2007-22-05/ /schemas/EmpMaster.xsd" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Employee>
<EmpId>Emp001</EmpId> <Name>DS Name 1</Name>
<Street>DS Street 1</Street>
<City>DS City 1</City>
<Zip>12345</Zip>
<Phone>123-456-7891</Phone>
</Employee>
<Employee>
<EmpId>Emp002</EmpId>
<Name>DS Name 2</Name>
<Street>DS Street 2</Street>
<City>DS City 2</City>
<Zip>12345</Zip>
<Phone>123-456-7892</Phone>
</Employee>
<Employee>
<EmpId>Emp003</EmpId>
<EmpId>Emp004</EmpId>

<Name>DS Name 3</Name>
<Street>DS Street 3</Street>
<City>DS City 3</City>
<Zip>12345</Zip>
<Phone>123-456-7893</Phone>
</Employee>
<Employee>
<EmpId>Emp005</EmpId>
<Name>DS Name 1</Name>
<Street>DS Street 1</Street>
<City>DS City 1</City>
<Zip>12345</Zip>
<Phone>123-456-7891</Phone>
</Employee>
</EmployeeMaster>
It does not appear to be doing the aggregation on the entire data collection but only on successive records, if available.... :o
I wonder how XML Output stage applies this "Smartness"... :?:
Should this go to "Tech Support" because this "Feature" does not appear to be consistent across.... :? :?
-V
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Hi Craig/Group,
After posting the previous response, I took your suggestion (about Key column identifcation) as a cue, started reading (re-reading) the XML Pack Guide document, especially Pages 4-16, 4-17 and later to see if there might be some missing clues.
Although i did not get a complete grasp of Rule-1, 2 and 3 for "Repitition", i figured that there is something about the way the "Repitition" element is specified and how DS handles it. As i had disabled all "Key" columns, i decided to enable a column other than the primary column. Lo and Behold, the "output-grouping-effect" was resolved :D :) :D
I tried a few other columns as keys and they work fine. I am not sure which rules and in which order would solve such use cases. Someone who might have a good idea of the nut-and-bolts of the XML Stage implementation might be able to throw some light... I am using the simpler version of my job (not the chunk-and-group) to accomplish my requirements.

If you/group think that this is not "The Solution", please let me know...

Thanks again for your time and input,
-V
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I would think you are fine now. For the record, what did you consider the 'primary column' that you had marked as a Key and marking what other column made it work for you?

From what I recall of the rules, the repitition element should be the 'lowest' level element. But then, Key selection is more of an art than a science. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Hi Craig,
Thanks again.
As far as my DB schema is concerned, the EmpId is the primary key. I was confusing (I am still a novice to Datastage...) that with the XML Output stage "Key".
In the final version........
For the XML Output stage, i marked the "Name" as the key. To be on the safer side, i had initially set the "Phone" (last element in my XML schema) as the "Key", then moved upwards. I wanted to make sure this solution does not break if i set some other column as the key.... so kept trying different combinations and settled on this one.
On the input stage, i did not set any "keys".
On the output stage, i set "Name" as the key.
One of these days, in the future, I hope i will use the help of gurus like you to understand the "Key" :wink:

Thanks to sjordery as well. I did not want to start another thread to discuss this same topic and used this chain....
-V
Nisusmage
Premium Member
Premium Member
Posts: 103
Joined: Mon May 07, 2007 1:57 am

Post by Nisusmage »

AAHhhh... I'd just like to say thanks to all who helped in this post. I've been battling with this problem for 2 days now.

THANK YOU !!..
~The simpliest solutions are always the best~
~Trick is to understand the complexity to implement simplicity~
Post Reply