Compress dataset files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
shin0066
Premium Member
Premium Member
Posts: 69
Joined: Tue Jun 12, 2007 8:42 am

Compress dataset files

Post by shin0066 »

Hi,

Is there a way to compress .ds dataset files to different location?

Thanks,
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No directly; but you could do a

Code: Select all

orchadmin dump {dataset} | gzip -c > zipfile.gz
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No. The .ds files themselves are tiny in any case, and are already in a binary format. The actual data of a Data Set resides elsewhere, on the resource disks specified in your configuration file. Again, these are already in a binary format. Attempting to compress them will not produce any gain worth having. And they could not be used in/from the other location.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shin0066
Premium Member
Premium Member
Posts: 69
Joined: Tue Jun 12, 2007 8:42 am

Post by shin0066 »

Thanks Ray and ArndW

Got the Answers!
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Total dataset size is 800 GB per day

Post by sanjay »

Hi All

We have huge volume of Data over total dataset size is 800 GB per day.

so i am plaining to compress dataset with following command

orchadmin dump {dataset} | gzip -c > zipfile.gz

How to uncompress it back

not sure abt compress & uncompress stage whether i can use it

Thanks
Sanjay
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Is there are reason why you are not compressing the dataset within the DataStage job itself? Use the compress stage in the job that creates the dataset and the expand stage in the job(s) that read the dataset. These can use with the Unix 'compress/uncompress' programs or 'gzip/gunzip'.

Compressing and uncompressing the data adds a little overhead, but can significantly reduce the I/O. In the end, the jobs will probably run as fast as they did before, maybe even faster due to the reduced I/O. Better yet, the data footprint will be significantly reduced.

Hope this helps.

Brad
It is not that I am addicted to coffee, it's just that I need it to survive.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

With DataSets the actual disk space used for an unbounded Varchar column is smaller than that used in a bounded one.
Post Reply