XML Output stage repetitive data elements issue

amit.jaiswal_ATL · Post by **amit.jaiswal_ATL** » Tue Feb 17, 2015 3:43 pm

Hello All,

I have below values in the source:
COL-A COL-B
2810945 S
2810965 S
2810985 S
4025390 H
4041510 B
4041512 B

I am expecting below XML structure

Code: Select all

<s:LineItems>
	<s:Code>2810945</s:Code>
	<s:Values>
		<s:CodeValue>S</s:CodeValue>
	</s:Values>
</s:LineItems>
<s:LineItems>
	<s:Code>2810965</s:Code>
	<s:Values>
		<s:CodeValue>S</s:CodeValue>
	</s:Values>
</s:LineItems>
<s:LineItems>
	<s:Code>2810985</s:Code>
	<s:Values>
		<s:CodeValue>S</s:CodeValue>
	</s:Values>
</s:LineItems>
<s:LineItems>
	<s:Code>4025390</s:Code>
	<s:Values>
		<s:CodeValue>H</s:CodeValue>
	</s:Values>
</s:LineItems>
<s:LineItems>
	<s:Code>4041510</s:Code>
	<s:Values>
		<s:CodeValue>B</s:CodeValue>
	</s:Values>
</s:LineItems>
<s:LineItems>
	<s:Code>4041512</s:Code>
	<s:Values>
		<s:CodeValue>B</s:CodeValue>
	</s:Values>
</s:LineItems>

But I am getting below result:

Code: Select all

<s:LineItems>
	<s:Code>2810945</s:Code>
	<s:Code>2810965</s:Code>
	<s:Code>2810985</s:Code>
	<s:Values>
		<s:CodeValue>S</s:CodeValue>
	</s:Values>
</s:LineItems>
<s:LineItems>
	<s:Code>4025390</s:Code>
	<s:Values>
		<s:CodeValue>H</s:CodeValue>
	</s:Values>
</s:LineItems>
<s:LineItems>
	<s:Code>4041510</s:Code>
	<s:Code>4041512</s:Code>
	<s:Values>
		<s:CodeValue>B</s:CodeValue>
	</s:Values>
</s:LineItems>

I used below XPATH to generate the data element structure:
COLUMN_CODE ==> /s:LineItems/s:Code
COLUMN_CODEVALUE ==> /s:LineItems/s:Values/s:CodeValue

I tried multiple options but could not generate the expected XML structure. Can you please let me know the process/option to accomplish the desired result?

Thanks in advance!

-Amit

eostic · Post by **eostic** » Tue Feb 17, 2015 4:44 pm

I suspect this is an issue that we see once in awhile when there are very few elements and there isn't anything uniquely identifying the lower level repeating instances.

If that's the issue, you should be able to alleviate the problem by adding a unique counter to that level of your hierarchy. Create a dummy column in a transformer upstream, and make it unique for every COL-B level row that comes thru.

Add it to your xml with its own xpath, perhaps with an element name of <dummyKey>, and make it the only repetition element.

Put this column above CodeValue, with xpath something like:

/s:LineItems/s:Values/s:dummyKey/text()

That should put things in the right order.....and then you can clean out the <dummyKey> element downstream in another transformer. Put an output link on your xmlOutput Stage and just have one large column on that link.....something like myXMLContent with longvarchar and a long length....put a single '/' in the Description property.

Ernie

amit.jaiswal_ATL · Post by **amit.jaiswal_ATL** » Tue Feb 17, 2015 8:07 pm

Thanks Ernie for your reply. I am just wondering, after adding dummykey element with it's value (counter 1,2,3, etc) in the XML data block, how to cleanup this from that XML block in the subsequent transformer? Are you suggesting to use function like ereplace and replace dummykey elelments with ZERO/empty space?

Thanks.
-Amit

eostic · Post by **eostic** » Tue Feb 17, 2015 8:28 pm

One way to do it is via something like ereplace.....

Yet another is to research how the xml is being used. If being consumed by a program that is specifically parsing the xml and looking for the known elements only, and not validating with an xsd, then just ignore the new dummy elements.

Ernie

amit.jaiswal_ATL · Post by **amit.jaiswal_ATL** » Wed Feb 18, 2015 4:02 pm

Thanks Ernie. To remove extra data elements in XML because of DummyKey I am looking for any option. So far I could not figure out any solution.

Here is what I have to achieve:
My XML structure
<s:LineItems>
<s:Code>2810945</s:Code>
<s:Values>
<s:DummyKey>123</s:DummyKey>
<s:CodeValue>S</s:CodeValue>
</s:Values>
</s:LineItems>
<s:LineItems>
<s:Code>2810965</s:Code>
<s:Values>
<s:DummyKey>1369</s:DummyKey>
<s:CodeValue>S</s:CodeValue>
</s:Values>
</s:LineItems>

My requirement is to remove <s:DummyKey>123</s:DummyKey>, <s:DummyKey>1369</s:DummyKey> and all other DummyKey elements from this XML block.

Can you please suggest some option to handle this within datastage?

Thanks in advance.

-Amit

ray.wurlod · Post by **ray.wurlod** » Wed Feb 18, 2015 4:31 pm

Is this XML a single string, or on multiple lines as shown? If the latter, simply use a filter (Filter stage or Transformer stage output link constraint) to prevent transfer of any line beginning with "<s:Dummy".

amit.jaiswal_ATL · Post by **amit.jaiswal_ATL** » Wed Feb 18, 2015 5:22 pm

Thanks Ray. Unfortunately, it is a single line XML block. Any solution for this scenario?

ray.wurlod · Post by **ray.wurlod** » Wed Feb 18, 2015 9:02 pm

You could possibly use the looping capability in a Transformer stage to loop through the elements and suppress the DummyKey elements from being transferred to the output.

Otherwise, of course, a routine to do the same.

amit.jaiswal_ATL · Post by **amit.jaiswal_ATL** » Fri Feb 20, 2015 10:23 am

Thanks Ray for your suggestions. I defined the "Value" column (Column-B) as a Key column in the Input of XML Output Stage and it gave me expected result.

eostic · Post by **eostic** » Fri Feb 20, 2015 12:26 pm

Cool! I assumed you had "B" as the key all along. Just be careful...this same symptom can occur when you have "B" as the key but you have repeats in the parent, which is the condition I thought you were running into.

Glad you got thru it!

Ernie