join & lookup
Moderators: chulett, rschirm, roy
join & lookup
Hi,
Lookup using scratch memory while join uses disk(physical) memory for the sorting it performs.
is it a right statement to make?
Lookup using scratch memory while join uses disk(physical) memory for the sorting it performs.
is it a right statement to make?
No, that statement does not reflect what happens. Both methods will use memory, but the lookup keeps the reference data in memory while the join stage sorts the streams (on the join key(s)) then needs only minimal memory at runtime.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Lookup keeps reference data in memory, not on disk.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
"Scratch" is temporary disk space, which is different from "temporary memory" but otherwise the definition is not wrong.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
No, I never said that. Join stages work by sorting the input links (which may or may not require scratch storage or buffer storage) and then doing an efficient comparison of records from the links. Because the data is sorted, it is not necessary to use much memory, unlike the lookup stage which requires that the complete reference data is in memory.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The reference data set for a Lookup stage must be able to reside in physical memory (other than for a sparse lookup).
Any other stage that uses memory, such as Sort, Aggregator, Join stage types, will use the amount of memory allocated. Only if they need more memory than that will they spill to scratchdisk.
Disk pools may get involved. For example the Sort stage will first spill to scratchdisk resources identified as being in the "sort" disk pool. If these fill, or if the disk pool does not exist, it will use the default disk pool (""). If this fills it will use the directory identified by the TMPDIR environment variable. If this fills it will use /tmp. If this fills you're dead.
Any other stage that uses memory, such as Sort, Aggregator, Join stage types, will use the amount of memory allocated. Only if they need more memory than that will they spill to scratchdisk.
Disk pools may get involved. For example the Sort stage will first spill to scratchdisk resources identified as being in the "sort" disk pool. If these fill, or if the disk pool does not exist, it will use the default disk pool (""). If this fills it will use the directory identified by the TMPDIR environment variable. If this fills it will use /tmp. If this fills you're dead.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.