There is a job which does a left outer join (using Join Stage) between two links :
Link1 - 30 million rows
Link2 - 10 million rows (Right Link)
When i use the explicit link sort (hash,sort) on both these links on my join keys , the job aborts after running for some time with this fatal error:
buffer(20),1: APT_BufferOperator: Add block to queue failed. This means that your buffer filesystems all ran out of file space, or that some other system error occurred. Please ensure that you have sufficient scratchdisks in either the default or "buffer" pools on all nodes in your configuration file.
But when I place an explicit Sort Stage on both the input links to the join stage, the job runs successfully to completion.
What exactly happens during a link sort that differs from a Sort Stage? Can someone please throw some light on the process.
Thanks in advance
Explicit Sort vs Link Sort
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Sort stage is more flexible about using memory - you even have the ability to specify the per-node memory usage that the stage takes. Also, with an explicit Sort stage, it may be that your job no longer needs as much information in the inserted buffer operator (see the score) at any one time, so that there are fewer issues about adding more space to the buffer.
There are some environment variables that you can use to tune the buffering, but I believe an explicit Sort stage gives the best of all possible worlds, so advocate using it whenever sorting is required.
There are some environment variables that you can use to tune the buffering, but I believe an explicit Sort stage gives the best of all possible worlds, so advocate using it whenever sorting is required.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Does using an explicit sort stage yield better performance as compared to a Link Sort? How does the memory usage between the two vary?ray.wurlod wrote:Sort stage is more flexible about using memory - you even have the ability to specify the per-node memory usage that the stage takes. Also, with an explicit Sort stage, it may be that your job no lon ...
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Thanks for the reply Ray..ray.wurlod wrote:It depends. You can tune memory consumption in the Sort stage explicitly for that stage. And "performance" needs to be defined. ...
Performance can be defined as :
1) Better throughput from the join stage or improved throughput from upstream stages/operators
2) Job not aborting as the number of records for the join stage increase (lets say 10 million rows/week)
One more question.. how much of an impact does placing/not placing a Sort Stage have on Link Buffering?