Page 1 of 2

Remove lines from the xm l file

Posted: Thu Oct 02, 2008 4:29 pm
by kumar66
Hi All,

I need to remove the header and the tralier from the xml file. It is a fixed line of header and tralier.

The header has 33 lines and the tralier has 3 lines.

How to do this? I guess this can be done in UNIX scripting. Could you please tell me what command to use in UNIX.

Or is there a possible way to do in datastage?

Please Advise.
Thanks & Regards,
Kumar66

Posted: Thu Oct 02, 2008 5:07 pm
by chulett
:? Post an example. There's no 'header' and 'trailer' records, per se, and removing them will break the parser, I would think. I'd also be curious why you think you need to do this.

Posted: Thu Oct 02, 2008 5:14 pm
by kumar66
Hi Chulett,

I have mentioned the example below:

<?xml version="1.0" encoding="UTF-8" ?>
- <!-- Generated by Ascential Software Corporation, DataStage - XMLOutput stage - Mon Nov 19 15:49:21 2007
-->
- <xtd:EBO xmlns:xtd="http://service.aaaa.com/GenericSchema" xmlns:esb="http://service.aaaa.com/schemas/ESBHeader" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <esb:ESBHeader>
<esb:EnvironmentName>Development</esb:EnvironmentName>
<esb:VersionNbr>1.0</esb:VersionNbr>
- <esb:BusinessEvent>
<esb:TransactionTypeCode>PurchaseOrderFullRfh</esb:TransactionTypeCode>
<esb:Description>Purchase _Order_Refresh</esb:Description>
<esb:Timestamp>2007-11-19 15:49:12</esb:Timestamp>
<esb:TotalRecordCount>34</esb:TotalRecordCount>
<esb:EventID>111</esb:EventID>
</esb:BusinessEvent>
- <esb:SourceSystem>
<esb:ApplicationName>DWA</esb:ApplicationName>
</esb:SourceSystem>
- <esb:RoutingInformation>
- <esb:Type>
<esb:LabelName>Transaction_Type</esb:LabelName>
<esb:RoutingTypeName>Transaction</esb:RoutingTypeName>
<esb:LabelValueText>Purchase_order</esb:LabelValueText>
</esb:Type>
</esb:RoutingInformation>
- <esb:DatasetInfo>
<esb:DatasetSizeQty>30</esb:DatasetSizeQty>
<esb:TotalDatasetCount>2</esb:TotalDatasetCount>
<esb:DatasetNbr>1</esb:DatasetNbr>
<esb:KeyField2>1</esb:KeyField2>
</esb:DatasetInfo>
</esb:ESBHeader>
- <xtd:EBOPayload>
- <xtd:Data>
<![CDATA[
0000082973|0000000004|QP799-MA|HOL 2002|0000000001|01|0000000002|
0000073994|0000000004|KS754-5A|BAS 9999|0000000092|11|0000000093|
0000081982|0000000004|QP799-VA|HOL 2002|0000000001|01|0000000002|
0000082975|0000000004|QP799-YA|HOL 2002|0000000001|01|0000000002|
]]>
</xtd:Data>
</xtd:EBOPayload>
</xtd:EBO>

Please Advise,

Thanks & Regads,
Kumar66

Posted: Thu Oct 02, 2008 5:46 pm
by filename.txt
Hi Kumar,

I think below commands would be helpful to remove header and trailer in unix.

a= `wc -l filename.xml`
# a is the total number of lines in that xml file
b=`expr a - 3`
# b is number of total lines minus 3 (trailer)
head -b filename.xml > file1
#The above command will copy remove trailer lines
c=`expr b - 33`
# c would be count of lines from file1 without header
tail -c file1 > filename.xml
# you will get filename.xml back without header and trailers.

Please check for any sytax errors for "expr" commands ... mostly this will work

Posted: Thu Oct 02, 2008 6:25 pm
by kumar66
Hi Filename.txt,

Thanks for your script. But when I execute this script it takes the line count from the "CDATA" . So I couldnt remove the header and tralier properely.

Please Advise.


Thanks & Regards,
Kumar66

Posted: Thu Oct 02, 2008 7:35 pm
by chulett
Again, why do you think you need to do this? And are you generating this xml or trying to read it? :?

Posted: Thu Oct 02, 2008 9:25 pm
by kumar66
Hi Chulett,

I am trying to read this xml . My idea is to chunck the header and tralier , so that i can read the detailed records .

This is because the xml file generated is not true xml file. The header , tralier and the detailed records are generated separately . So i thought i may use this approach.

Please Advise.

Thanks & Regards,
Kumar66

Posted: Thu Oct 02, 2008 9:31 pm
by eostic
From a quick glance, there's nothing "wrong" with it from an xml perspective.....you might as well use XMLInput to read it and save yourself a lot of trouble! Nothing wrong with your current approach per se, but using XMLInput would be a lot simpler...

Ernie

Posted: Fri Oct 03, 2008 7:33 am
by chulett
Just because it was built "separately" doesn't mean it's "not a true xml file". As Ernie notes, read it with the XML Input stage.

Posted: Fri Oct 03, 2008 10:35 am
by kumar66
Hi Ernie and Chulett,

Thanks for your replies. Yes I will do with xml input stage. Just to do proof of concept , how can I achieve this, to remove header and tralier and read the detailed records.



Please Advise.

Thanks & Regrds,
Kumar66

Posted: Fri Oct 03, 2008 10:48 am
by chulett
It does it automatically. That's why we're confused as to the approach you're talking about. I take it you haven't worked much with XML before this?

Posted: Fri Oct 03, 2008 11:00 am
by kumar66
Hi Chulett,

My idea is to try this approach. Thats it.


Thanks & Regards,
Kumar66

Posted: Fri Oct 03, 2008 3:02 pm
by kumar66
Hi Chulett,


The xml is huge and we planned to do this approach. Because I tried with the external source stage to read the big xml file , but i couldn't. This was also the reason to do this approach.

Please Advise.

Thanks & Regards,
Kumar66

Posted: Fri Oct 03, 2008 3:20 pm
by chulett
Define 'huge'. Ernie's website (link in sig) discusses how to use the External Source stage to feed XML Input the filename(s) for it to parse.

Posted: Fri Oct 03, 2008 7:31 pm
by eostic
There are a hundred ways to remove the header and trailer info...."if" the xml is entirely pretty-printed, you could trust CRLF's, set up counters and do someting there....but that won't be true with all files, especially those created by DS, because the CRLF's aren't always where you might expect. Assuming huge is not "too" huge, you could pass the string to a Transformer and use a multitide of Transform functions to break the string into pieces and then remove bits, moving thru the tags as delimeters..... you'll have to play with things like FIELD and REMOVE, etc.....but rest assured that there are more string manipulations in UV/Basic than you could imagine. Study the Basic documentation. I'd probably play with Server here --- it is more flexible with entirely unknown length variable text strings.

...still vastly simpler to just use XMLInput.

Ernie