XML Output - Abnormal Termination of Stage

rameshrr3 · Post by **rameshrr3** » Wed Dec 27, 2006 7:20 am

Im having a job that writes some data to XML output pack stage. Apart from writing some dat to XML output pack, im also including an XML nested chunk in the XML o/p stage using nested chunk option by specifying file path of nested chunk.The chunk can be rather large (40 MB)or very small (200 KB or less) depending on the change data to be processed. The job terminates with abnormal termination of the xml output stage usually if the nested chunk exceeds around 20 MB in size. I cannot control the size of the nested chunk because it is generated by another program and keeps changing in size. All tunables are at maximum values including buffer size(1024K) and timeout(600), inter process buffering is also enabled.

Without the nested chunk , the job gives no problems, but my XML file will be complete only if nested chunk is included, but since nested chunk is variable in size , i cannot predict if job will complete succesfully or not whenever it runs.

Is there any way of avoiding this issue? any tunables that can be tweaked? Is large string data the problem? Any help is welcome

Thanks
Ramesh

chulett · Post by **chulett** » Wed Dec 27, 2006 8:13 am

I haven't used that particular functionality before as the need for it never came up - and we do a ton of XML work. So, out of curiosity...

Does this 'nested chunk' need to be included all at once in this manner? Or are there various elements of it that would match up with a set of keys in your input data such that it could be picked up 'a piece at a time'?

Kind of hard to phrase the question understandably, but wondering if this data can be loaded into a hashed file, which is how I've always been able to handle chunks up to this point. For each key or set of key values, a LongVarchar holds the XML for that key and then a reference lookup is performed before the XML Output stage to include the appropriate portion of the XML needed for that particular input row.

If that's possible given the nature of your data, then it would solve your size problem.

rameshrr3 · Post by **rameshrr3** » Wed Dec 27, 2006 9:14 am

Thanks for your reply craig. But i guess there's no way i can match up the contents of the nested chunk with my input data. I just wanted to know if there was a size limitation on the nested chunk. My other option would be to use a shell script to concatenate the contents of the nested chunk with another nested chunk containing what i use this job for and encapsulating it under that parent start and end tags. What i got to know from one of ray's posts was that oversized string data can cause abnormal stage termination and wondering if the job is encountering a similar situation once the 'nested chunk' exceeds 20 MB.

chulett · Post by **chulett** » Wed Dec 27, 2006 10:12 am

There are size limits with certain stages and you've seem to found yours. I don't believe there is any way to change any settings to allow larger filesizes to be handled. Perhaps one other approach...

We had one large chunk that needed to be included in every XML file (up to 400 per run) we generated in a certain jobstream and doing it inline with everything else cost too much. What we ended up doing was putting that chunk again up in a hashed file as a single record with a single hard-coded key value, something like 'X' let's say.

We then take our 'final' file which was also generated as a chunk and read it in via a Folder stage which brings it all in in one field. A hard-coded lookup to the Big Chunk Hashed file pulls that in and both are written out together using a Sequential File stage - that was the key that allowed this to work with larger sizes. Only caveat was the fact that since the input needed to be a chunk and we weren't using an XML stage to write out the 'final final' file, we had to code the main wrapper tags into the Transformer derivations that surrounded the two large pieces.

For example:

Code: Select all

'<?xml version="1.0" encoding="UTF-8"?>
<aptas:datafeed xmlns:aptas="http://www.aptas.com/SearchFeed"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" locale="en">'

And:

Code: Select all

'</aptas:datafeed>'

The output is a perfectly lovely bit o' XML with the two chunks happily merged. You'd need to adjust your derivations appropriately, these are just examples.

rameshrr3 · Post by **rameshrr3** » Thu Dec 28, 2006 4:23 am

Thanks Craig . We found your reply helpful. Will probably use a very similar approach, albeit the differences in our peculiar situation .

FOLDER(with chunk as wildcard)--------->TRX(concate chunk with end and start tags)--------->SeqFile(named as XML file)

I dont know if this will also hit any limitation in size though.