Running Zipped File in DataStage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
suchit1983
Participant
Posts: 9
Joined: Thu Nov 29, 2007 10:45 pm

Running Zipped File in DataStage

Post by suchit1983 »

Hi,

I have a source file Test.gz(Zipped File). I need to do some transformations and put it in another file Test1.gz. How i achieve this?

The file is in UNIX server and i need to read it from DataStage.

Can i use External Source and External Target stages?
What kind of programs can i write?

Thanks,
Suchit
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You could use external source, but it's probably easier to use a Filter command in the Sequential File stage. Your unzip command's output is piped directly to the "input" of the Sequential File stage (that is, no unzipped file appears on disk).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You won't be able to directly create the gzipped output, so create the output file and then arrange the gzip 'after job' - either directly from the command line or write a generic gzip script anyone can leverage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Craig - actually you can read & write compressed files by using the filter option on sequential files.

When reading a gzipped file use the filter "gunzip -c" and when writing use "gzip -c". This can actually result in faster throughput for some jobs - particularly when there is excess CPU capacity and a slow disk (i.e. a SAN drive on a busy backbone). I think we almost doubled write speeds on the last project by putting these filters in.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Interesting... I don't recall the Filter option as being available on the output side of the house. Ah well, you learn something new every day! :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
suchit1983
Participant
Posts: 9
Joined: Thu Nov 29, 2007 10:45 pm

Running Zipped File in DataStage

Post by suchit1983 »

Hi,

I tried the above suggested option.
My job desgin is:

External Source Stage --> Transformer --> Sequential File

I used gunzip -c Test.gz command in External Source Stage. The log says data import is successful.

In the sequential file. I am creating a Test1.dat and in the filter option, the command is gzip -c Test1.dat.
I assume it creates Test1.dat first and then creates a zip file Test1.gz.

But, I am getting the following error.

Sequential_File_5,0: write() failed: Broken pipe
Sequential_File_5,0: Export failed
Sequential_File_5,0: write() failed: Bad file number

Please let me know how I can correct this error.

Thanks,
Suchit
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Running Zipped File in DataStage

Post by chulett »

suchit1983 wrote:In the sequential file. I am creating a Test1.dat and in the filter option, the command is gzip -c Test1.dat.
For the Filter command, try just "gzip": no filename, no "-c".
-craig

"You can never have too many knives" -- Logan Nine Fingers
suchit1983
Participant
Posts: 9
Joined: Thu Nov 29, 2007 10:45 pm

Running Zipped File in DataStage

Post by suchit1983 »

Thanks a lot chulett. Its working fine now.
Post Reply