Page 1 of 1

Need help in understanding Look-up Functionality

Posted: Wed Dec 23, 2009 4:11 am
by zulfi123786
While i am using the look up stage, I have observed that files are being created.....

As per my knowledge (Please correct if wrong), the look up stage operated in Primary memory (RAM) then why are the files created in the scratch disk, is it because the data is too large to be placed in the RAM ?

Thanks..................

Re: Need help in understanding Look-up Functionality

Posted: Wed Dec 23, 2009 5:00 am
by priyadarshikunal
zulfi123786 wrote:While i am using the look up stage, I have observed that files are being created.....

As per my knowledge (Please correct if wrong), the look up stage operated in Primary memory (RAM) then why are the files created in the scratch disk, is it because the data is too large to be placed in the RAM ?

Thanks..................
Do you want to keep entire reference records in RAM for all the times and slow all other processes. it works same as any operating system to keep the RAM clear as much as possible without degrading the performance.

It does uses the RAM but managing memory is also necessary. At the time actual lookup happens data is in the memory.

Posted: Wed Dec 23, 2009 4:12 pm
by ray.wurlod
There are tunable limits on how much memory is used for virtual Data Sets and for buffers. (It will come as no surprise that they are tuned by setting environment variable values.) When this memory is reached, then DataStage uses scratchdisk.

Posted: Wed Dec 23, 2009 9:58 pm
by vmcburney
I ran a test on this a few years ago. http://it.toolbox.com/blogs/infosphere/datastage-tip-for-beginners-parallel-lookup-types-7183. For under 1000 rows lookups were faster than joins and it didn't matter what your lookup source was as it fitted into RAM memory with ease. I ran some tests on 3 million lookup rows and found a Lookup stage with the reference data in a Lookup Fileset was fastest at 42 seconds, a join stage was just over a minute and a Lookup with source data in a database, sequential file or dataset was over 2 minutes.

So if you want performance improvements on large lookups think about lookup filesets or join stages.