Disk space for a job

pkomalla · Post by **pkomalla** » Mon Oct 23, 2006 7:23 am

Hi All,

I have a job with two remove duplicates , a lookup stage and two transformers. My source and target are database stages

I have to run job with around 30 million records. Length of a record is 50

Can anyone suggest how much space do I need to have?

how do i calculate the disk space needed?

Thanks

kumar_s · Post by **kumar_s** » Mon Oct 23, 2006 7:17 pm

If the data is not sorted, you may need to include even a sort stage.
It again depends on the number of duplicated the input contain, and the data of the lookup (either its a sparse or Lookup fileset).
If it is a lookupfileset, and if you have less records per group (duplicates) and with simple transformation logic, you will have very less usage of disk.

Kirtikumar · Post by **Kirtikumar** » Tue Oct 24, 2006 12:30 am

As data is going to target, it depends on record size in target and no. of records. So approx (record size * total records).
If you are asking about the scratch space required then is would be depndent on the physical memory available for the job in addition to what Kumar_s has mentioned in his post.

kumar_s · Post by **kumar_s** » Tue Oct 24, 2006 3:14 am

Not all stages use scratch disk space extensivly. Sometimes it may not use at all, other than buffering operation. So it also depends on the operators that involved in the Job.