Compress dataset files

shin0066 · Post by **shin0066** » Thu Aug 02, 2007 1:56 pm

Hi,

Is there a way to compress .ds dataset files to different location?

Thanks,

ArndW · Post by **ArndW** » Thu Aug 02, 2007 3:56 pm

No directly; but you could do a

orchadmin dump {dataset} | gzip -c > zipfile.gz

ray.wurlod · Post by **ray.wurlod** » Thu Aug 02, 2007 3:57 pm

No. The .ds files themselves are tiny in any case, and are already in a binary format. The actual data of a Data Set resides elsewhere, on the resource disks specified in your configuration file. Again, these are already in a binary format. Attempting to compress them will not produce any gain worth having. And they could not be used in/from the other location.

shin0066 · Post by **shin0066** » Fri Aug 03, 2007 12:49 pm

Thanks Ray and ArndW

Got the Answers!

sanjay · Post by **sanjay** » Wed Jul 30, 2008 1:19 pm

Hi All

We have huge volume of Data over total dataset size is 800 GB per day.

so i am plaining to compress dataset with following command

orchadmin dump {dataset} | gzip -c > zipfile.gz

How to uncompress it back

not sure abt compress & uncompress stage whether i can use it

Thanks
Sanjay

bcarlson · Post by **bcarlson** » Thu Jul 31, 2008 8:51 am

Is there are reason why you are not compressing the dataset within the DataStage job itself? Use the compress stage in the job that creates the dataset and the expand stage in the job(s) that read the dataset. These can use with the Unix 'compress/uncompress' programs or 'gzip/gunzip'.

Compressing and uncompressing the data adds a little overhead, but can significantly reduce the I/O. In the end, the jobs will probably run as fast as they did before, maybe even faster due to the reduced I/O. Better yet, the data footprint will be significantly reduced.

Hope this helps.

Brad

ArndW · Post by **ArndW** » Thu Jul 31, 2008 9:54 am

With DataSets the actual disk space used for an unbounded Varchar column is smaller than that used in a bounded one.

DSXchange

Compress dataset files

Compress dataset files

Total dataset size is 800 GB per day