Need help in understanding Look-up Functionality

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Need help in understanding Look-up Functionality

Post by zulfi123786 »

While i am using the look up stage, I have observed that files are being created.....

As per my knowledge (Please correct if wrong), the look up stage operated in Primary memory (RAM) then why are the files created in the scratch disk, is it because the data is too large to be placed in the RAM ?

Thanks..................
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Re: Need help in understanding Look-up Functionality

Post by priyadarshikunal »

zulfi123786 wrote:While i am using the look up stage, I have observed that files are being created.....

As per my knowledge (Please correct if wrong), the look up stage operated in Primary memory (RAM) then why are the files created in the scratch disk, is it because the data is too large to be placed in the RAM ?

Thanks..................
Do you want to keep entire reference records in RAM for all the times and slow all other processes. it works same as any operating system to keep the RAM clear as much as possible without degrading the performance.

It does uses the RAM but managing memory is also necessary. At the time actual lookup happens data is in the memory.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There are tunable limits on how much memory is used for virtual Data Sets and for buffers. (It will come as no surprise that they are tuned by setting environment variable values.) When this memory is reached, then DataStage uses scratchdisk.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I ran a test on this a few years ago. http://it.toolbox.com/blogs/infosphere/datastage-tip-for-beginners-parallel-lookup-types-7183. For under 1000 rows lookups were faster than joins and it didn't matter what your lookup source was as it fitted into RAM memory with ease. I ran some tests on 3 million lookup rows and found a Lookup stage with the reference data in a Lookup Fileset was fastest at 42 seconds, a join stage was just over a minute and a Lookup with source data in a database, sequential file or dataset was over 2 minutes.

So if you want performance improvements on large lookups think about lookup filesets or join stages.
Post Reply