Running Zipped File in DataStage

suchit1983 · Post by **suchit1983** » Sun Apr 06, 2008 12:08 am

Hi,

I have a source file Test.gz(Zipped File). I need to do some transformations and put it in another file Test1.gz. How i achieve this?

The file is in UNIX server and i need to read it from DataStage.

Can i use External Source and External Target stages?
What kind of programs can i write?

Thanks,
Suchit

ray.wurlod · Post by **ray.wurlod** » Sun Apr 06, 2008 3:07 am

You could use external source, but it's probably easier to use a Filter command in the Sequential File stage. Your unzip command's output is piped directly to the "input" of the Sequential File stage (that is, no unzipped file appears on disk).

chulett · Post by **chulett** » Sun Apr 06, 2008 8:26 am

You won't be able to directly create the gzipped output, so create the output file and then arrange the gzip 'after job' - either directly from the command line or write a generic gzip script anyone can leverage.

ArndW · Post by **ArndW** » Sun Apr 06, 2008 8:50 am

Craig - actually you can read & write compressed files by using the filter option on sequential files.

When reading a gzipped file use the filter "gunzip -c" and when writing use "gzip -c". This can actually result in faster throughput for some jobs - particularly when there is excess CPU capacity and a slow disk (i.e. a SAN drive on a busy backbone). I think we almost doubled write speeds on the last project by putting these filters in.

chulett · Post by **chulett** » Sun Apr 06, 2008 9:09 am

Interesting... I don't recall the Filter option as being available on the output side of the house. Ah well, you learn something new every day!

suchit1983 · Post by **suchit1983** » Sun Apr 06, 2008 5:40 pm

Hi,

I tried the above suggested option.
My job desgin is:

External Source Stage --> Transformer --> Sequential File

I used gunzip -c Test.gz command in External Source Stage. The log says data import is successful.

In the sequential file. I am creating a Test1.dat and in the filter option, the command is gzip -c Test1.dat.
I assume it creates Test1.dat first and then creates a zip file Test1.gz.

But, I am getting the following error.

Sequential_File_5,0: write() failed: Broken pipe
Sequential_File_5,0: Export failed
Sequential_File_5,0: write() failed: Bad file number

Please let me know how I can correct this error.

Thanks,
Suchit

chulett · Post by **chulett** » Sun Apr 06, 2008 6:01 pm

suchit1983 wrote:In the sequential file. I am creating a Test1.dat and in the filter option, the command is gzip -c Test1.dat.

For the Filter command, try just "gzip": no filename, no "-c".

suchit1983 · Post by **suchit1983** » Sun Apr 06, 2008 7:28 pm

Thanks a lot chulett. Its working fine now.

DSXchange