Read a network request and response file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Read a network request and response file

Post by chetan.c »

Hi,
I have a file which is like this.
How to read this file.
[iamie]
TLTSID=EC2172542B72102BB5AE2D89C82DEAEC
TLTHID=EC2172542B72102BB5AE2D89C82DEAEC
CaptureType=1
CaptureVersion=3330

[appdata]

[env]
REQ_BUFFER_ENCODING=UTF-8
REQ_BUFFER_ORIG_ENCODING=UTF-8
RESP_BODY_ENCODING=UTF-8

[privacy]
response|body|@1=Key00001|9|@|EAAAAEOGDqRXcCWtjpMD1JauU/M=

[urlfield]
<
xml version="1.0" encoding="utf-8"
>%0a<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelop ... sageHeader xmlns="http://csi.cingular.com/CSI/Namespaces/ ... aModel.xsd" xmlns:ns2="http://csi.cingular.com/CSI/Namespaces/ ... r.xsd">%0a <ns2:TrackingMessageHeader>%0a <version>v52</version>%0a <messageId>
.........

[cookies]
TLT_NumCookies=0
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Define what "read" means here. What exactly do you need to do with this file?
-craig

"You can never have too many knives" -- Logan Nine Fingers
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

A sample from the file
[env]
REQ_BUFFER_ENCODING=UTF-8
REQ_BUFFER_ORIG_ENCODING=UTF-8
RESP_BODY_ENCODING=UTF-8

REQ_BUFFER_ENCODING This should be my column name.
UTF-8 -----Data under that coulmn

Similarly for all other fileds.

Also there is an XML portion in the file.
Kindly guide me to read this file.

Thanks.
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

Hi Craig,

Do you want any more information or the entire file?

Please help me to read this file.


Thanks.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

As per my understanding, your's is semi xml file.
so, separate the xml portion and read it via xml stage.
And read the remaining part as a fixed width file in sequential file stage,
pandeeswaran
kmsekhar
Premium Member
Premium Member
Posts: 58
Joined: Fri Apr 16, 2010 12:58 pm
Location: Chn

Post by kmsekhar »

Chetan,

Your desired O/P should be?
REQ_BUFFER_ENCODING|REQ_BUFFER_ORIG_ENCODING|RESP_BODY_ENCODING
UTF-8|UTF-8|UTF-8
Thanks,
Sekhar
kmsekhar
Premium Member
Premium Member
Posts: 58
Joined: Fri Apr 16, 2010 12:58 pm
Location: Chn

Post by kmsekhar »

Try the below script to get desired columns/rows:

1. To display rows between [env] & [privacy]

Code: Select all

awk '/env/{s=x}{s=s$0"\n"}/privacy/{print s}' input | sed -e '/^$/d' -e 1d -e 'N;$!P;$!D;$d' >input1
2. To convert column to row

Code: Select all

awk -F '=' '
        {
                for (i=1; i<=NF; i++)  {
                        a[NR,i] = $i
                }
        }
        NF>p { p = NF }
        END {
                for(j=1; j<=p; j++) {
                        str=a[1,j]
                        for(i=2; i<=NR; i++){
                                str=str"|"a[i,j];
                        }
                        print str
                }
        }'input1

Thanks,
Sekhar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... then you can return your DataStage for a refund!

:roll:

That said, DataStage can perform a read of this file, but you have not specified entirely what you want to do with the data.

You can, for example, parse name=value pairs using Field() functions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kmsekhar
Premium Member
Premium Member
Posts: 58
Joined: Fri Apr 16, 2010 12:58 pm
Location: Chn

Post by kmsekhar »

ray.wurlod wrote:You can, for example, parse name=value pairs using Field() functions.
If you want to fetch only the rows under [env means]

Example:
SEQ_STG [Under Filter]

Code: Select all

awk '/env/{s=x}{s=s$0"\n"}/privacy/{print s}' | sed -e '/^$/d' -e 1d -e 'N;$!P;$!D;$d' 
Define only one column and set Quotes,Delimiter=none

Then map it to transformer and use Field() function As suggested by Ray.
Thanks,
Sekhar
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

Hi Ray,

The file as we saw conatins XML content as well.
The target table contains columns for whihc data has to be extracted from XML content and the other data.
Please see the extract from the file.

StatusCode=200
StatusCodeText=OK
RequestHeaderSize=301
RequestDataSize=2122
RequestSize=2423
ResponseHeaderSize=150
ResponseDataSize=421513
ResponseSize=421663

<Request>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Header>
<mes:MessageHeader xmlns:mes="http://csi.cingular.com/CSI/Namespaces/ ... in:version xmlns:cin="http://csi.cingular.com/CSI/Namespaces/ ... sd">[b]v52[/b]</cin:version><cin:messageId xmlns:cin="http://csi.cingular.com/CSI/Namespaces/ ... p1eam2m9_9[/b]</cin:messageId><cin:timeToLive xmlns:cin="http://csi.cingular.com/CSI/Namespaces/ ... ">[b]20000[/b]</cin:timeToLive><cin:dateTimeStamp xmlns:cin="http://csi.cingular.com/CSI/Namespaces/ ... 27:05.428Z[/b]</cin:dateTimeStamp>

......REst of the XML.

Now i want to extract the data into columns as below.I have higlighted the values that i require from XML.


StatusCode,StatusCodeText,RequestHeaderSize,RequestDataSize,RequestSize,ResponseHeaderSize,ResponseDataSize,ResponseSize,Version,Timetolive,Timestamp
200,OK,301,2122,2423,150,421513,421663,v52,20000,2011-12-21T01:27:05.428Z.

Last three columns from XML rest of the other columns from non XML.
Please let me know if any further info is required.
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

hi Shekar,

Thanks for response.
If its possible to read entirely in Datastage then would like to g o ahead with it else will try out you approach too.


Thanks.
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

It is good to separate your source file in to 2 parts... atleast 2 streams. Your XML message will always start with </ or with some comment signs.

Do you know whether the number of below lines is going to be constant?
StatusCode=200
StatusCodeText=OK
RequestHeaderSize=301
RequestDataSize=2122
RequestSize=2423
ResponseHeaderSize=150
ResponseDataSize=421513
ResponseSize=421663
If yes, the you can achieve what you want by using stage variable (e.g. StageVar1) defined as StageVar1 : Field(Col,1,'=')... Once done, you need to select the last record which would look like
StatusCode,StatusCodeText,RequestHeaderSize,RequestDataSize,RequestSize,ResponseHeaderSize,ResponseDataSize,ResponseSize
There are many ways to get the last record (or first record after sorting in DESC)

Do similarly to get data in the required format... (except for the data from XML message)

Similarly, you can get data from XML messages separately and link to the above retrieved data by using a common key.
Kandy
_________________
Try and Try again…You will succeed atlast!!
chetan.c
Participant
Posts: 112
Joined: Tue Jan 17, 2012 2:09 am
Location: Bangalore

Post by chetan.c »

Hi Kandy,
Thanks for the response.
The number of lines are not constant and also since there are a large number of files ,i will have multiple StatusCodeText,RequestHeaderSize and other fields names..

For the XML part i tried by looking for the charecter "<" and picking the entire XML till XML ends but the problem there issue i'm facing is I'm reading it into one column and becuase of the number of files is lhuge XML data it overruns the buffer limit.Even if i try to increse the buffer size will it be an optimal solution..?.


Thanks.
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

Little difficult to understand the actual requirements. Can you post one file content and tell us what you want?

Did you mean you will have one file with sample content shown below

dummy1=100
dummy2=200
dummy3=300

<xml message>
<xml message>
<xml message>

dummy7= 400
dummy6=500
dummy12=1200
dummy53=1100

<xml message>
<xml message>
<xml message>
<xml message>
<xml message>

..... etc?
Kandy
_________________
Try and Try again…You will succeed atlast!!
kmsekhar
Premium Member
Premium Member
Posts: 58
Joined: Fri Apr 16, 2010 12:58 pm
Location: Chn

Post by kmsekhar »

kandyshandy wrote:you can achieve what you want by using stage variable (e.g. StageVar1) defined as StageVar1 : Field(Col,1,'=')
Assume, we have sample file:

StatusCode=200
StatusCodeText=OK

After applying Field function data will be splitted like

StatusCode
200
StatusCodeText
OK
......

.....n records

How the records will be transposed as below in DS 8.1 :shock:

StatusCode|StatusCodeText
200|OK
Thanks,
Sekhar
Post Reply