handling segmented xml data.

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kalpna
Premium Member
Premium Member
Posts: 78
Joined: Thu Feb 02, 2006 3:56 am

handling segmented xml data.

Post by kalpna »

Hello Everyone!!
Actually we need to process large xml files( 10MB- 50 MB).
we get these files from MQ-Series.
but, as these files are very large to handle we are compressing it(gzip utility) and then segmenting it through MQ-Series and then retreiving through datastage.

But, in datastage when iam using the GUNZip utility it doesn't recongnise the format..

to make sure that it is not adding any additional information, i just tried segmenting plain xml file(without compressing) and passed it to datastage.
But, when i am trying to process this xml it doesn't formed well.
the reason is, the first segment contains half of the tag (something like '<sapven' ) and the second segment contains the other half ofthe tag ( 'dorNum>' ) and those are different lines
so, how can i hanlde or trick this in datastage..(as MQ experts says, its not possible to control the way it segments the msg in MQ)
kalpna
Premium Member
Premium Member
Posts: 78
Joined: Thu Feb 02, 2006 3:56 am

Post by kalpna »

let me put my questn this way!!

say my file is somethign like this...

<lostDemand>
<extractDate>2006-01-20</extractDate>
<dcLocation>D985</dcLo
cation>
<bnqCode>24766418</bnqCode>
<e
an>5014957128637</ean>
<orderQty>00120</orderQty>
<issueQty>00090</issueQty>
<numLines>0001</numLines>
<totalOrderQty>00120</totalOrderQty>
</lostDemand>

tricking the data like:

if trim(Data)[1] = '<' and (Count(trim(Data), ' <') <> Count(trim(Data), ' <'))
then
concatenate the next line to the present line
else
trim(Data)

how can i acheive this?

thanks
kalpna
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You are gzipping it and then 'segmenting'? If so, do you know when you have all the pieces? You'll need to put the segments back together and then gunzip the file before you can parse the XML in DataStage. And it seems like the 'putting back together' part could simply be concatenation done by your operating system...
-craig

"You can never have too many knives" -- Logan Nine Fingers
kalpna
Premium Member
Premium Member
Posts: 78
Joined: Thu Feb 02, 2006 3:56 am

Post by kalpna »

Thanks for ur responcse craig!!
Iam reading the message from MQ using WebSphere MQ stage.
which just reads all segments as a single message.
so, i am not doing anything to concatenate..!!
and writing the retreived data to a file.

kalpna
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not being all that familiar with MQ - if all segments are read as a single message, why segment? When you write out the results to a file, do you get one single file that contains all the segments in the proper order? :?

Is the problem the fact that there are extra 'new lines' in the file between segments? Could you not strip those out using sed so that you end up with one long record when all is said and done? Then you could process it like 'normal' in DataStage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kalpna
Premium Member
Premium Member
Posts: 78
Joined: Thu Feb 02, 2006 3:56 am

Post by kalpna »

chulett wrote:Not being all that familiar with MQ - if all segments are read as a single message, why segment? When you write out the results to a file, do you get one single file that contains all the segments in the proper order? :?.
Because, MQ has a limit on the size of the message. we are segmenting the message to make it 3 or 4 messages. yes!. the file contains segments in the proper order.

Craig! How can we strip the new line chars from file using sed?

Thanks in advance
kalpna
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Talk to one of your UNIX or scripting gurus there and have them help you. It's a 'stream editor' that supports regular expressions so you would basically tell it to replace all line-feed characters with an empty string, ie remove them.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kalpna
Premium Member
Premium Member
Posts: 78
Joined: Thu Feb 02, 2006 3:56 am

Post by kalpna »

Thanks Craig!!
kalpna
Premium Member
Premium Member
Posts: 78
Joined: Thu Feb 02, 2006 3:56 am

Post by kalpna »

Hello Everyone!!

Anyone worked with segmentation before??

What are the properties do we need to set in the MQ Stage apart from the usual parameters?


Please refer to my 1st post..
Anyone tried this before?


i got rid of the new lines in the compressed message retreived from the MQ and tried to unzip it but, gzip utility does not identify it.
tried without removing the new lines but, didn't work..

any help would b greatly appreciated

Thanks
Kalpna
Post Reply