Page 1 of 1

scratch folder filling very fast for a particular table

Posted: Sun Jul 13, 2008 8:48 am
by kirankota79
I am writing an oracle table to a dataset and i am doing this for many tables. But for one particular table it is occupying lot of scratch space and i am running out of my disk space. Can someone let me know the exact reason?

Thanks

Posted: Sun Jul 13, 2008 9:03 am
by ArndW
I think there is some information missing that might explain this behaviour. Are you doing a straight load from table to dataset, and are you doing any sorting (implicit or explicit)?

Posted: Sun Jul 13, 2008 9:08 am
by kirankota79
thanks Andrw

my job is

oracle stage--->transformer--->join---->transformer--->dataset

In the oracle stage i set as table read and not a select query. As far as i know
i haven't set any sort option. i am using same kind of job for other tables. i don't see any problem with them. Please let me know if you need any particular info

Thanks

Posted: Sun Jul 13, 2008 9:10 am
by kirankota79
the name of the files that are getting created in scratch folder looks like tsortxxx each with 9.94MB
I am deleting them while the job is running to avoid the disk full.

Is it OK?

Posted: Sun Jul 13, 2008 9:57 am
by ArndW
Your design has a join with only one input, that was probably an oversight. The "tsort" files are from that stage.

Posted: Sun Jul 13, 2008 10:36 am
by kirankota79
sorry the join stage has one lookup file too. since lookup stage cannot allow huge amount of data..i am using join stage for lookup

Posted: Sun Jul 13, 2008 3:49 pm
by ray.wurlod
Look at the job score. Inserted tsort operators exist because the Join stage requires sorted inputs.

Posted: Sun Jul 13, 2008 10:53 pm
by keshav0307
how big is the lookup file.
try using lookup stage, rather then join.

Posted: Mon Jul 14, 2008 8:42 am
by kirankota79
lookup file is just 10,000 records with 2 cols. The input table is huge and i think the lookup stage doesnot accomodate that amount of data. it contains 10 million records. i do lookup on one column. In order to avoid problems in the future i am using join stage.

Posted: Mon Jul 14, 2008 9:05 am
by girija
If you using join stage then the procees will create the tsort file. First of all if you need all the column from your source then use read otherwise better to select the fields. Second your lookup table is very very small and the memory usage and performance of lookup stage doesn't depend on the source stream but depends on the lookup stream. So I think its better to use the lookup-stage instead of join.

Posted: Mon Jul 14, 2008 8:37 pm
by keshav0307
10000 records are not huge, and can be easily accommodated, the main link need not to be sorted or will not be hold in memory, and records will be keep propagating.