scratch folder filling very fast for a particular table

kirankota79 · Post by **kirankota79** » Sun Jul 13, 2008 8:48 am

I am writing an oracle table to a dataset and i am doing this for many tables. But for one particular table it is occupying lot of scratch space and i am running out of my disk space. Can someone let me know the exact reason?

Thanks

ArndW · Post by **ArndW** » Sun Jul 13, 2008 9:03 am

I think there is some information missing that might explain this behaviour. Are you doing a straight load from table to dataset, and are you doing any sorting (implicit or explicit)?

kirankota79 · Post by **kirankota79** » Sun Jul 13, 2008 9:08 am

thanks Andrw

my job is

oracle stage--->transformer--->join---->transformer--->dataset

In the oracle stage i set as table read and not a select query. As far as i know
i haven't set any sort option. i am using same kind of job for other tables. i don't see any problem with them. Please let me know if you need any particular info

Thanks

kirankota79 · Post by **kirankota79** » Sun Jul 13, 2008 9:10 am

the name of the files that are getting created in scratch folder looks like tsortxxx each with 9.94MB
I am deleting them while the job is running to avoid the disk full.

Is it OK?

ArndW · Post by **ArndW** » Sun Jul 13, 2008 9:57 am

Your design has a join with only one input, that was probably an oversight. The "tsort" files are from that stage.

kirankota79 · Post by **kirankota79** » Sun Jul 13, 2008 10:36 am

sorry the join stage has one lookup file too. since lookup stage cannot allow huge amount of data..i am using join stage for lookup

ray.wurlod · Post by **ray.wurlod** » Sun Jul 13, 2008 3:49 pm

Look at the job score. Inserted tsort operators exist because the Join stage requires sorted inputs.

keshav0307 · Post by **keshav0307** » Sun Jul 13, 2008 10:53 pm

how big is the lookup file.
try using lookup stage, rather then join.

kirankota79 · Post by **kirankota79** » Mon Jul 14, 2008 8:42 am

lookup file is just 10,000 records with 2 cols. The input table is huge and i think the lookup stage doesnot accomodate that amount of data. it contains 10 million records. i do lookup on one column. In order to avoid problems in the future i am using join stage.

girija · Post by **girija** » Mon Jul 14, 2008 9:05 am

If you using join stage then the procees will create the tsort file. First of all if you need all the column from your source then use read otherwise better to select the fields. Second your lookup table is very very small and the memory usage and performance of lookup stage doesn't depend on the source stream but depends on the lookup stream. So I think its better to use the lookup-stage instead of join.

keshav0307 · Post by **keshav0307** » Mon Jul 14, 2008 8:37 pm

10000 records are not huge, and can be easily accommodated, the main link need not to be sorted or will not be hold in memory, and records will be keep propagating.