join & lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsa
Participant
Posts: 37
Joined: Sun Oct 10, 2010 7:52 am

join & lookup

Post by dsa »

Hi,

Lookup using scratch memory while join uses disk(physical) memory for the sorting it performs.

is it a right statement to make?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No, that statement does not reflect what happens. Both methods will use memory, but the lookup keeps the reference data in memory while the join stage sorts the streams (on the join key(s)) then needs only minimal memory at runtime.
dsa
Participant
Posts: 37
Joined: Sun Oct 10, 2010 7:52 am

Post by dsa »

Sorry
what I meant was look up keeps reference data into scratch

is it right now?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Lookup keeps reference data in memory, not on disk.
dsa
Participant
Posts: 37
Joined: Sun Oct 10, 2010 7:52 am

Post by dsa »

What my understanding is :
Scratch is temporary memory and when we say resource disk it means permanent memory or disk .

Please correct me if I am wrong.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

"Scratch" is temporary disk space, which is different from "temporary memory" but otherwise the definition is not wrong.
dsa
Participant
Posts: 37
Joined: Sun Oct 10, 2010 7:52 am

Post by dsa »

oh

so join uses permanent memory which is also not resource disk?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No, I never said that. Join stages work by sorting the input links (which may or may not require scratch storage or buffer storage) and then doing an efficient comparison of records from the links. Because the data is sorted, it is not necessary to use much memory, unlike the lookup stage which requires that the complete reference data is in memory.
dsa
Participant
Posts: 37
Joined: Sun Oct 10, 2010 7:52 am

Post by dsa »

Thanks for clearing my doubts !!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The reference data set for a Lookup stage must be able to reside in physical memory (other than for a sparse lookup).

Any other stage that uses memory, such as Sort, Aggregator, Join stage types, will use the amount of memory allocated. Only if they need more memory than that will they spill to scratchdisk.

Disk pools may get involved. For example the Sort stage will first spill to scratchdisk resources identified as being in the "sort" disk pool. If these fill, or if the disk pool does not exist, it will use the default disk pool (""). If this fills it will use the directory identified by the TMPDIR environment variable. If this fills it will use /tmp. If this fills you're dead.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply