Page 1 of 1

Job failure due to space

Posted: Wed Mar 18, 2009 12:22 pm
by wahi80
Hi,

I have a job design which looks as follows

Code: Select all



Ds1----
	   Join 1-----
Ds2----		    |
		 		 
		            Funnel -------- Sort ------- Dataset	
		 
Ds3----          |
	    Join 2-----
Ds4----

There is link sort and partitioning on links Ds1, Ds2,Ds3 and Ds4.
Ds1 and Ds2 are sorted and partitioned on 2 keys (total key length 52 char)
Ds3 and Ds4 are partitioned on one key (Len 2)

The link from funnel is partitioned by 3 keys and in Sort stage 6 keys are used to define the sort.

The length of all keys defined in sort stage is 66

The link after sort has same partitioning defined on it

All re-partitions defined are required for the logic

Issue:
At a particular point almost 200mn+ rows are written out at the funnel, and the job fails due to no space issue.

Scratch space available is 100GB

How do I solve this?
Env team will not be increasing space in near future

Regards
Wah

Re: Job failure due to space

Posted: Wed Mar 18, 2009 12:46 pm
by tehavele
Hows the configuration file like ?
May be we can dedicate one node to sorting functionality alone.

Regards,
Tejas

Re: Job failure due to space

Posted: Wed Mar 18, 2009 12:53 pm
by tehavele
Partition on 3 keys can be done on input links to funnel.Then Sort will have same partition method.


Regards,
Tejas

Re: Job failure due to space

Posted: Wed Mar 18, 2009 1:18 pm
by wahi80
Currently using 8 node configuration file.
Also how would partitioning at input of Funnel stage help to reduce space issue?

Regards
Wah

Re: Job failure due to space

Posted: Thu Mar 19, 2009 12:31 am
by tehavele
I think instead of partitioning all 200mn rows in one go if we could partition those in chunks could resolve the space issue as 200mn rows worth space wont be required.Not sure on this.

Posted: Thu Mar 19, 2009 3:34 am
by DSDexter
What is your source? what is the type of partitioning you are doing in the job?

What is the OS flavaour?

Posted: Thu Mar 19, 2009 3:40 am
by Sainath.Srinivasan
Does the link sort appear in the mentioned job or another job before this?

Try splitting the jobs into smaller jobs.

What is your scratch capacity, what OS?

Does the source Datasets directly landed from any database?

Posted: Thu Mar 19, 2009 8:07 am
by wahi80
OS is Unix
Partitioning used is Hash partition
The datasets are generated in previous jobs.

I'm planning to split job in two halves. The first half would be till the funnel stage

The second half would just sort and insert into the dataset.

Posted: Thu Mar 19, 2009 9:23 pm
by dh_Madhu
Yes, also try splitting the job horizontaly if your business rules allow you to do that.
Job1
Ds1----
Join 1----- Funnel -------- Sort ------- Ds5
Ds2----

Job2
Ds3----
Join 2----- Funnel -------- Sort ------- Ds 6
Ds4----
Job3
Ds5----
Join 2----- Funnel -------- Sort ------- Ds 7
Ds6----

Posted: Tue Mar 24, 2009 3:31 pm
by wahi80
Hi Madhu,

Looking at the third job in your design you join datasets in previous jobs and then sort again. The problem is that when we sort in Job 3 it will sort all the rows again.

So sorting in Job1 and Job2 will be rendered useless, and we will use the complete scratch space again

Regards
Wah