Remove lines from the xm l file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kumar66
Participant
Posts: 265
Joined: Thu Jul 26, 2007 12:14 am

Remove lines from the xm l file

Post by kumar66 »

Hi All,

I need to remove the header and the tralier from the xml file. It is a fixed line of header and tralier.

The header has 33 lines and the tralier has 3 lines.

How to do this? I guess this can be done in UNIX scripting. Could you please tell me what command to use in UNIX.

Or is there a possible way to do in datastage?

Please Advise.
Thanks & Regards,
Kumar66
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? Post an example. There's no 'header' and 'trailer' records, per se, and removing them will break the parser, I would think. I'd also be curious why you think you need to do this.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar66
Participant
Posts: 265
Joined: Thu Jul 26, 2007 12:14 am

Post by kumar66 »

Hi Chulett,

I have mentioned the example below:

<?xml version="1.0" encoding="UTF-8" ?>
- <!-- Generated by Ascential Software Corporation, DataStage - XMLOutput stage - Mon Nov 19 15:49:21 2007
-->
- <xtd:EBO xmlns:xtd="http://service.aaaa.com/GenericSchema" xmlns:esb="http://service.aaaa.com/schemas/ESBHeader" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <esb:ESBHeader>
<esb:EnvironmentName>Development</esb:EnvironmentName>
<esb:VersionNbr>1.0</esb:VersionNbr>
- <esb:BusinessEvent>
<esb:TransactionTypeCode>PurchaseOrderFullRfh</esb:TransactionTypeCode>
<esb:Description>Purchase _Order_Refresh</esb:Description>
<esb:Timestamp>2007-11-19 15:49:12</esb:Timestamp>
<esb:TotalRecordCount>34</esb:TotalRecordCount>
<esb:EventID>111</esb:EventID>
</esb:BusinessEvent>
- <esb:SourceSystem>
<esb:ApplicationName>DWA</esb:ApplicationName>
</esb:SourceSystem>
- <esb:RoutingInformation>
- <esb:Type>
<esb:LabelName>Transaction_Type</esb:LabelName>
<esb:RoutingTypeName>Transaction</esb:RoutingTypeName>
<esb:LabelValueText>Purchase_order</esb:LabelValueText>
</esb:Type>
</esb:RoutingInformation>
- <esb:DatasetInfo>
<esb:DatasetSizeQty>30</esb:DatasetSizeQty>
<esb:TotalDatasetCount>2</esb:TotalDatasetCount>
<esb:DatasetNbr>1</esb:DatasetNbr>
<esb:KeyField2>1</esb:KeyField2>
</esb:DatasetInfo>
</esb:ESBHeader>
- <xtd:EBOPayload>
- <xtd:Data>
<![CDATA[
0000082973|0000000004|QP799-MA|HOL 2002|0000000001|01|0000000002|
0000073994|0000000004|KS754-5A|BAS 9999|0000000092|11|0000000093|
0000081982|0000000004|QP799-VA|HOL 2002|0000000001|01|0000000002|
0000082975|0000000004|QP799-YA|HOL 2002|0000000001|01|0000000002|
]]>
</xtd:Data>
</xtd:EBOPayload>
</xtd:EBO>

Please Advise,

Thanks & Regads,
Kumar66
filename.txt
Participant
Posts: 27
Joined: Thu Mar 20, 2008 11:55 am

Post by filename.txt »

Hi Kumar,

I think below commands would be helpful to remove header and trailer in unix.

a= `wc -l filename.xml`
# a is the total number of lines in that xml file
b=`expr a - 3`
# b is number of total lines minus 3 (trailer)
head -b filename.xml > file1
#The above command will copy remove trailer lines
c=`expr b - 33`
# c would be count of lines from file1 without header
tail -c file1 > filename.xml
# you will get filename.xml back without header and trailers.

Please check for any sytax errors for "expr" commands ... mostly this will work
Thanks.

"Creativity is the ability to use your available resources to their fullest."
kumar66
Participant
Posts: 265
Joined: Thu Jul 26, 2007 12:14 am

Post by kumar66 »

Hi Filename.txt,

Thanks for your script. But when I execute this script it takes the line count from the "CDATA" . So I couldnt remove the header and tralier properely.

Please Advise.


Thanks & Regards,
Kumar66
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Again, why do you think you need to do this? And are you generating this xml or trying to read it? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar66
Participant
Posts: 265
Joined: Thu Jul 26, 2007 12:14 am

Post by kumar66 »

Hi Chulett,

I am trying to read this xml . My idea is to chunck the header and tralier , so that i can read the detailed records .

This is because the xml file generated is not true xml file. The header , tralier and the detailed records are generated separately . So i thought i may use this approach.

Please Advise.

Thanks & Regards,
Kumar66
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

From a quick glance, there's nothing "wrong" with it from an xml perspective.....you might as well use XMLInput to read it and save yourself a lot of trouble! Nothing wrong with your current approach per se, but using XMLInput would be a lot simpler...

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Just because it was built "separately" doesn't mean it's "not a true xml file". As Ernie notes, read it with the XML Input stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar66
Participant
Posts: 265
Joined: Thu Jul 26, 2007 12:14 am

Post by kumar66 »

Hi Ernie and Chulett,

Thanks for your replies. Yes I will do with xml input stage. Just to do proof of concept , how can I achieve this, to remove header and tralier and read the detailed records.



Please Advise.

Thanks & Regrds,
Kumar66
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It does it automatically. That's why we're confused as to the approach you're talking about. I take it you haven't worked much with XML before this?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar66
Participant
Posts: 265
Joined: Thu Jul 26, 2007 12:14 am

Post by kumar66 »

Hi Chulett,

My idea is to try this approach. Thats it.


Thanks & Regards,
Kumar66
kumar66
Participant
Posts: 265
Joined: Thu Jul 26, 2007 12:14 am

Post by kumar66 »

Hi Chulett,


The xml is huge and we planned to do this approach. Because I tried with the external source stage to read the big xml file , but i couldn't. This was also the reason to do this approach.

Please Advise.

Thanks & Regards,
Kumar66
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Define 'huge'. Ernie's website (link in sig) discusses how to use the External Source stage to feed XML Input the filename(s) for it to parse.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

There are a hundred ways to remove the header and trailer info...."if" the xml is entirely pretty-printed, you could trust CRLF's, set up counters and do someting there....but that won't be true with all files, especially those created by DS, because the CRLF's aren't always where you might expect. Assuming huge is not "too" huge, you could pass the string to a Transformer and use a multitide of Transform functions to break the string into pieces and then remove bits, moving thru the tags as delimeters..... you'll have to play with things like FIELD and REMOVE, etc.....but rest assured that there are more string manipulations in UV/Basic than you could imagine. Study the Basic documentation. I'd probably play with Server here --- it is more flexible with entirely unknown length variable text strings.

...still vastly simpler to just use XMLInput.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply