Disk space for a job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pkomalla
Premium Member
Premium Member
Posts: 44
Joined: Tue Mar 21, 2006 6:18 pm

Disk space for a job

Post by pkomalla »

Hi All,

I have a job with two remove duplicates , a lookup stage and two transformers. My source and target are database stages


I have to run job with around 30 million records. Length of a record is 50

Can anyone suggest how much space do I need to have?

how do i calculate the disk space needed?

Thanks
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

If the data is not sorted, you may need to include even a sort stage.
It again depends on the number of duplicated the input contain, and the data of the lookup (either its a sparse or Lookup fileset).
If it is a lookupfileset, and if you have less records per group (duplicates) and with simple transformation logic, you will have very less usage of disk.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

Post by Kirtikumar »

As data is going to target, it depends on record size in target and no. of records. So approx (record size * total records).
If you are asking about the scratch space required then is would be depndent on the physical memory available for the job in addition to what Kumar_s has mentioned in his post.
Regards,
S. Kirtikumar.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Not all stages use scratch disk space extensivly. Sometimes it may not use at all, other than buffering operation. So it also depends on the operators that involved in the Job.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply