scratch folder filling very fast for a particular table

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

scratch folder filling very fast for a particular table

Post by kirankota79 »

I am writing an oracle table to a dataset and i am doing this for many tables. But for one particular table it is occupying lot of scratch space and i am running out of my disk space. Can someone let me know the exact reason?

Thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I think there is some information missing that might explain this behaviour. Are you doing a straight load from table to dataset, and are you doing any sorting (implicit or explicit)?
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

thanks Andrw

my job is

oracle stage--->transformer--->join---->transformer--->dataset

In the oracle stage i set as table read and not a select query. As far as i know
i haven't set any sort option. i am using same kind of job for other tables. i don't see any problem with them. Please let me know if you need any particular info

Thanks
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

the name of the files that are getting created in scratch folder looks like tsortxxx each with 9.94MB
I am deleting them while the job is running to avoid the disk full.

Is it OK?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Your design has a join with only one input, that was probably an oversight. The "tsort" files are from that stage.
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

sorry the join stage has one lookup file too. since lookup stage cannot allow huge amount of data..i am using join stage for lookup
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Look at the job score. Inserted tsort operators exist because the Join stage requires sorted inputs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

how big is the lookup file.
try using lookup stage, rather then join.
kirankota79
Premium Member
Premium Member
Posts: 315
Joined: Tue Oct 31, 2006 3:38 pm

Post by kirankota79 »

lookup file is just 10,000 records with 2 cols. The input table is huge and i think the lookup stage doesnot accomodate that amount of data. it contains 10 million records. i do lookup on one column. In order to avoid problems in the future i am using join stage.
girija
Participant
Posts: 89
Joined: Fri Mar 24, 2006 1:51 pm
Location: Hartford

Post by girija »

If you using join stage then the procees will create the tsort file. First of all if you need all the column from your source then use read otherwise better to select the fields. Second your lookup table is very very small and the memory usage and performance of lookup stage doesn't depend on the source stream but depends on the lookup stream. So I think its better to use the lookup-stage instead of join.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

10000 records are not huge, and can be easily accommodated, the main link need not to be sorted or will not be hold in memory, and records will be keep propagating.
Post Reply