Page 1 of 1

handling segmented xml data.

Posted: Thu Mar 09, 2006 4:58 am
by kalpna
Hello Everyone!!
Actually we need to process large xml files( 10MB- 50 MB).
we get these files from MQ-Series.
but, as these files are very large to handle we are compressing it(gzip utility) and then segmenting it through MQ-Series and then retreiving through datastage.

But, in datastage when iam using the GUNZip utility it doesn't recongnise the format..

to make sure that it is not adding any additional information, i just tried segmenting plain xml file(without compressing) and passed it to datastage.
But, when i am trying to process this xml it doesn't formed well.
the reason is, the first segment contains half of the tag (something like '<sapven' ) and the second segment contains the other half ofthe tag ( 'dorNum>' ) and those are different lines
so, how can i hanlde or trick this in datastage..(as MQ experts says, its not possible to control the way it segments the msg in MQ)

Posted: Thu Mar 09, 2006 6:06 am
by kalpna
let me put my questn this way!!

say my file is somethign like this...

<lostDemand>
<extractDate>2006-01-20</extractDate>
<dcLocation>D985</dcLo
cation>
<bnqCode>24766418</bnqCode>
<e
an>5014957128637</ean>
<orderQty>00120</orderQty>
<issueQty>00090</issueQty>
<numLines>0001</numLines>
<totalOrderQty>00120</totalOrderQty>
</lostDemand>

tricking the data like:

if trim(Data)[1] = '<' and (Count(trim(Data), ' <') <> Count(trim(Data), ' <'))
then
concatenate the next line to the present line
else
trim(Data)

how can i acheive this?

thanks
kalpna

Posted: Thu Mar 09, 2006 7:18 am
by chulett
You are gzipping it and then 'segmenting'? If so, do you know when you have all the pieces? You'll need to put the segments back together and then gunzip the file before you can parse the XML in DataStage. And it seems like the 'putting back together' part could simply be concatenation done by your operating system...

Posted: Thu Mar 09, 2006 7:27 am
by kalpna
Thanks for ur responcse craig!!
Iam reading the message from MQ using WebSphere MQ stage.
which just reads all segments as a single message.
so, i am not doing anything to concatenate..!!
and writing the retreived data to a file.

kalpna

Posted: Thu Mar 09, 2006 7:37 am
by chulett
Not being all that familiar with MQ - if all segments are read as a single message, why segment? When you write out the results to a file, do you get one single file that contains all the segments in the proper order? :?

Is the problem the fact that there are extra 'new lines' in the file between segments? Could you not strip those out using sed so that you end up with one long record when all is said and done? Then you could process it like 'normal' in DataStage.

Posted: Thu Mar 09, 2006 9:42 am
by kalpna
chulett wrote:Not being all that familiar with MQ - if all segments are read as a single message, why segment? When you write out the results to a file, do you get one single file that contains all the segments in the proper order? :?.
Because, MQ has a limit on the size of the message. we are segmenting the message to make it 3 or 4 messages. yes!. the file contains segments in the proper order.

Craig! How can we strip the new line chars from file using sed?

Thanks in advance
kalpna

Posted: Thu Mar 09, 2006 11:58 am
by chulett
Talk to one of your UNIX or scripting gurus there and have them help you. It's a 'stream editor' that supports regular expressions so you would basically tell it to replace all line-feed characters with an empty string, ie remove them.

Posted: Fri Mar 10, 2006 4:58 am
by kalpna
Thanks Craig!!

Posted: Wed Mar 15, 2006 8:54 am
by kalpna
Hello Everyone!!

Anyone worked with segmentation before??

What are the properties do we need to set in the MQ Stage apart from the usual parameters?


Please refer to my 1st post..
Anyone tried this before?


i got rid of the new lines in the compressed message retreived from the MQ and tried to unzip it but, gzip utility does not identify it.
tried without removing the new lines but, didn't work..

any help would b greatly appreciated

Thanks
Kalpna