Page 1 of 1

Warning message when changing lookup stage to a join stage

Posted: Fri Jul 28, 2006 4:42 am
by miwinter
Having changed a lookup stage to a join stage (for performance improvement), we now see the warning below in the log - is there any explanation for this?

MapResult: When checking operator: A sequential operator cannot preserve the partitioning
of the parallel data set on input port 0.

Posted: Fri Jul 28, 2006 4:48 am
by ray.wurlod
You've got a Sequential File stage downstream of the Join stage. For the Join stage you have had to partition the data based on key value. The message is essentially telling you that the Sequential File stage does not operate in parallel, so cannot preserve the partitioning specified for the Join stage.

Posted: Fri Jul 28, 2006 4:56 am
by miwinter
Thank you, I'll check it out

Posted: Sat Jul 29, 2006 12:09 am
by kumar_s
As explained in the other post, Clear the partition on the output table for the Preserve Partion option.

Re: Warning message when changing lookup stage to a join sta

Posted: Sun Jul 30, 2006 4:06 am
by dsusr
First of all why have you removing lookup stage to a join stage for performance improvement because lookup always give better performance to join.

dsusr

Re: Warning message when changing lookup stage to a join sta

Posted: Sun Jul 30, 2006 7:35 am
by kumar_s
dsusr wrote:First of all why have you removing lookup stage to a join stage for performance improvement because lookup always give better performance to join.

dsusr
Not in all cases. Untill the lookup data is small enough to handled in lookup memory, lookup approach is better. Once it is beyond certain limit, it is advisable to switch join, which doesnt pre load the data into lookup memeory.

Re: Warning message when changing lookup stage to a join sta

Posted: Mon Jul 31, 2006 2:36 am
by dsusr
kumar_s wrote:Not in all cases. Untill the lookup data is small enough to handled in lookup memory, lookup approach is better. Once it is beyond certain limit, it is advisable to switch join, which doesnt pre load the data into lookup memeory.
Yes Kumar you are right that it depends on the data but performance of join will always be better and if we use the lookup when we have large amount of data then possibly due to high consumption of memory jobs may start getting aborted.

Also I had a chat with Senior IBM engineer and as per him we can use lookup till the data is less than 1 gb.