Having changed a lookup stage to a join stage (for performance improvement), we now see the warning below in the log - is there any explanation for this?
MapResult: When checking operator: A sequential operator cannot preserve the partitioning
of the parallel data set on input port 0.
Warning message when changing lookup stage to a join stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You've got a Sequential File stage downstream of the Join stage. For the Join stage you have had to partition the data based on key value. The message is essentially telling you that the Sequential File stage does not operate in parallel, so cannot preserve the partitioning specified for the Join stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: Warning message when changing lookup stage to a join sta
First of all why have you removing lookup stage to a join stage for performance improvement because lookup always give better performance to join.
dsusr
dsusr
Re: Warning message when changing lookup stage to a join sta
Not in all cases. Untill the lookup data is small enough to handled in lookup memory, lookup approach is better. Once it is beyond certain limit, it is advisable to switch join, which doesnt pre load the data into lookup memeory.dsusr wrote:First of all why have you removing lookup stage to a join stage for performance improvement because lookup always give better performance to join.
dsusr
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Re: Warning message when changing lookup stage to a join sta
Yes Kumar you are right that it depends on the data but performance of join will always be better and if we use the lookup when we have large amount of data then possibly due to high consumption of memory jobs may start getting aborted.kumar_s wrote:Not in all cases. Untill the lookup data is small enough to handled in lookup memory, lookup approach is better. Once it is beyond certain limit, it is advisable to switch join, which doesnt pre load the data into lookup memeory.
Also I had a chat with Senior IBM engineer and as per him we can use lookup till the data is less than 1 gb.