~~~Warning in job~~~

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

~~~Warning in job~~~

Post by vigneshra »

Hi

My job runs perfectly well with a smaller volumes in the order of 2 or 3 millions. But once when the volume shoots up, my job throws the following warning continuously but job runs to completion without aborting.

APT_ParallelSortMergeOperator,2: Unbalanced input from partition 2: 10000 records buffered

How can I avoid this warning ?? Any Ideas welcome !!

Vignesh.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Well, go to the Search page here and enter "Unbalanced input from partition" and check the Exact match option. You'll find it's been covered before.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi Vignesh,

This is due to wrong partitioning. You have done the HASH partitioning twice. If you have done HASH partitioning once use the SAME partitioning for the remaining stages until you do the ROUND ROBIN partitioning. Instead of SAME partitioning you have again done the HASH partitioning and that is the reason you get this warning.

HTH
--Rich

Pride comes before a fall
Humility comes before honour
mandyli
Premium Member
Premium Member
Posts: 898
Joined: Wed May 26, 2004 10:45 pm
Location: Chicago

Post by mandyli »

Yes Rich you are correct b'cus of wrong partitioning you are getting this warning. check you partitioning method..


Thanks
Man
ganive
Participant
Posts: 18
Joined: Wed Sep 28, 2005 7:06 am

Post by ganive »

Can this error happen even if you're using HASH Partitioning but twice but on two different Key values ??
richdhan wrote:Hi Vignesh,

This is due to wrong partitioning. You have done the HASH partitioning twice. If you have done HASH partitioning once use the SAME partitioning for the remaining stages until you do the ROUND ROBIN partitioning. Instead of SAME partitioning you have again done the HASH partitioning and that is the reason you get this warning.

HTH
--Rich

Pride comes before a fall
Humility comes before honour
--------
GaNoU
--------
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, of course, particularly if you are depending on the partitioning being the same, for example to perform a lookup.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ganive
Participant
Posts: 18
Joined: Wed Sep 28, 2005 7:06 am

Post by ganive »

Woow... so that mean that you can't use the same Dataset if you have to perform Join/Lookup/Merge... operations on different key values ?

I mean, each Time I had to do that, I used to "repartition" my "Reference" Dataset by hashing it on another key depending on the data I wanted to join with :o
ray.wurlod wrote:Yes, of course, particularly if you are depending on the partitioning being the same, for example to perform a lookup.
--------
GaNoU
--------
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No, it doesn't mean that at all. You can use the same Data Set, but you do have to ensure that the lookup will find all the keys it needs to, and that means that the stream and reference inputs must be partitioned identically (and sorted identically for Join and Merge stages) on the keys being used for the lookup. Or, if it's a small enough Data Set, use Entire partitioning.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ganive
Participant
Posts: 18
Joined: Wed Sep 28, 2005 7:06 am

Post by ganive »

OK... I thought I had used PX 2 years without understanding anything... Saved !! :wink:

So let's get back to the subject : :twisted:
Far earlier, Rich said that the Error "Unbalanced Input From partition 0..." is due to a Hash partitioning made Twice.
Is it the real problem then ? :?:
According to my usage, and what I can read, you can re-Hash whenever you want it, even if it is not optimum (Same partition should be used if you are using the same Hash Keys).

In my case, I have a stream input, coming from a Flat File. I'm using the partition method to Hash on the same Key as my reference Dataset (the reference input is using same partition as it has been partitionned earlier).
Every other link is same... except for a writing in a dataset where I use Round Robin.

And I'm getting the famous error "Unbalanced Input From Partition 0...".

Any Idea ?? :?:
ray.wurlod wrote:No, it doesn't mean that at all. You can use the same Data Set, but you do have to ensure that the lookup will find all the keys it needs to, and that means that the stream and reference inputs must be partitioned identically (and sorted identically for Join and Merge stages) on the keys being used for the lookup. Or, if it's a small enough Data Set, use Entire partitioning.
--------
GaNoU
--------
ganive
Participant
Posts: 18
Joined: Wed Sep 28, 2005 7:06 am

Post by ganive »

Okay, Just solved the problem.

It seems that PX version 7.1, while using the Join Stage, automatically sort your input data.
I'm used to partition my data before the Join stage and check the Sort box as well... that's why I get the Warning !!

All I had to do to solve it, was to uncheck the sort box.
Maybe this will help some of you.
ganive wrote:OK... I thought I had used PX 2 years without understanding anything... Saved !! :wink:

So let's get back to the subject : :twisted:
Far earlier, Rich said that the Error "Unbalanced Input From partition 0..." is due to a Hash partitioning made Twice.
Is it the real problem then ? :?:
According to my usage, and what I can read, you can re-Hash whenever you want it, even if it is not optimum (Same partition should be used if you are using the same Hash Keys).

In my case, I have a stream input, coming from a Flat File. I'm using the partition method to Hash on the same Key as my reference Dataset (the reference input is using same partition as it has been partitionned earlier).
Every other link is same... except for a writing in a dataset where I use Round Robin.

And I'm getting the famous error "Unbalanced Input From Partition 0...".

Any Idea ?? :?:
ray.wurlod wrote:No, it doesn't mean that at all. You can use the same Data Set, but you do have to ensure that the lookup will find all the keys it needs to, and that means that the stream and reference inputs must be partitioned identically (and sorted identically for Join and Merge stages) on the keys being used for the lookup. Or, if it's a small enough Data Set, use Entire partitioning.
--------
GaNoU
--------
madhukar
Participant
Posts: 86
Joined: Fri May 20, 2005 4:05 pm

Post by madhukar »

chk the order of the keys in partition and the sort key are different...
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

ganive wrote:Okay, Just solved the problem.

It seems that PX version 7.1, while using the Join Stage, automatically sort your input data.
I'm used to partition my data before the Join stage and check the Sort box as well... that's why I get the Warning !!

All I had to do to solve it, was to uncheck the sort box.
Maybe this will help some of you.
ganive wrote:OK... I thought I had used PX 2 years without understanding anything... Saved !! :wink:

So let's get back to the subject : :twisted:
Far earlier, Rich said that the Error "Unbalanced Input From partition 0..." is due to a Hash partitioning made Twice.
Is it the real problem then ? :?:
According to my usage, and what I can read, you can re-Hash whenever you want it, even if it is not optimum (Same partition should be used if you are using the same Hash Keys).

In my case, I have a stream input, coming from a Flat File. I'm using the partition method to Hash on the same Key as my reference Dataset (the reference input is using same partition as it has been partitionned earlier).
Every other link is same... except for a writing in a dataset where I use Round Robin.

And I'm getting the famous error "Unbalanced Input From Partition 0...".

Any Idea ?? :?:
ray.wurlod wrote:No, it doesn't mean that at all. You can use the same Data Set, but you do have to ensure that the lookup will find all the keys it needs to, and that means that the stream and reference inputs must be partitioned identically (and sorted identically for Join and Merge stages) on the keys being used for the lookup. Or, if it's a small enough Data Set, use Entire partitioning.
Hi,

As far as i know, the join stage wont sort automaticlly unless AUTO partition is specified otherwise.

-Kumar
Post Reply