Need help in understanding Look-up Functionality

zulfi123786 · Post by **zulfi123786** » Wed Dec 23, 2009 4:11 am

While i am using the look up stage, I have observed that files are being created.....

As per my knowledge (Please correct if wrong), the look up stage operated in Primary memory (RAM) then why are the files created in the scratch disk, is it because the data is too large to be placed in the RAM ?

Thanks..................

priyadarshikunal · Post by **priyadarshikunal** » Wed Dec 23, 2009 5:00 am

zulfi123786 wrote:While i am using the look up stage, I have observed that files are being created.....

As per my knowledge (Please correct if wrong), the look up stage operated in Primary memory (RAM) then why are the files created in the scratch disk, is it because the data is too large to be placed in the RAM ?

Thanks..................

Do you want to keep entire reference records in RAM for all the times and slow all other processes. it works same as any operating system to keep the RAM clear as much as possible without degrading the performance.

It does uses the RAM but managing memory is also necessary. At the time actual lookup happens data is in the memory.

ray.wurlod · Post by **ray.wurlod** » Wed Dec 23, 2009 4:12 pm

There are tunable limits on how much memory is used for virtual Data Sets and for buffers. (It will come as no surprise that they are tuned by setting environment variable values.) When this memory is reached, then DataStage uses scratchdisk.

vmcburney · Post by **vmcburney** » Wed Dec 23, 2009 9:58 pm

I ran a test on this a few years ago. http://it.toolbox.com/blogs/infosphere/datastage-tip-for-beginners-parallel-lookup-types-7183. For under 1000 rows lookups were faster than joins and it didn't matter what your lookup source was as it fitted into RAM memory with ease. I ran some tests on 3 million lookup rows and found a Lookup stage with the reference data in a Lookup Fileset was fastest at 42 seconds, a join stage was just over a minute and a Lookup with source data in a database, sequential file or dataset was over 2 minutes.

So if you want performance improvements on large lookups think about lookup filesets or join stages.

DSXchange

Need help in understanding Look-up Functionality

Need help in understanding Look-up Functionality

Re: Need help in understanding Look-up Functionality