Adding newlines to multiple files in stream

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Adding newlines to multiple files in stream

Post by iq_etl »

Hi,

I've got a directory full of gz compressed xml files that I want to stream through an xml transformer to parse them to delimited text. The xml transformer needs to know where each file ends in order to accept them as valid xml. But the xml within the gz files, which originate in an external Web service, don't end with a newline.

Currently, I'm reading the files in with a wildcard in an external source stage using zcat to stream the uncompressed xml. I've tried putting a sed command in that stage and in a subsequent external filter stage to add a newline at the end of the file, but have only been able to add the newline at the end of the whole set.

Does anyone know of a way to do this? (insert a newline at the end of each xml file) I'd like to avoid landing all the unzipped files on my server, and I'm afraid I can't affect the contents of the gz files before I get them.

Thanks for any help!

Kelly
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

OK, I found a fix for the immediate problem.

Instead of a sed command that looks for the end of the file and adds a newline, I've got it finding the closing tag of my xml and replacing it with that tag plus a linebreak.

I've now got it taking multiple files, transforming them and writing the text out with a sequential file stage. I'm not getting it to write a separate file for each input file (row) as I'd like at the moment, but I'll keep at it.

Kelly
Post Reply