Reg: Reference data size for lookup stage.

css.raghu · Post by **css.raghu** » Tue Aug 03, 2010 8:03 am

Hi,

I have a simple scenario as below.

one source sequential file. (20042 rows)
one oracle table as referance. (4067814 rows)
i have used lookup stage to lookup the data.
the lookup stage property is auto partitioned.
client is complaining that the target data is not correct for few records.
what i feel is the size problem of lookup stage.
Please let me know whether the lookup can handle these many rows as referance.

Regards,
Raghu Ghantasala.

kris007 · Post by **kris007** » Tue Aug 03, 2010 8:24 am

Can you post the exact error message? The LookUp Stage can definitely handle 4 million records.

priyadarshikunal · Post by **priyadarshikunal** » Tue Aug 03, 2010 8:33 am

Number of records doesn't matter if your jobs is running without error.

lookup will partion reference link data to entire unless "inserting partitioning automatically" is disabled. If data is not correct for few records, i would first check data and try to find a link between wrong values.

Seems to me like Data/Design problem rather than limitation of Datastage / hardware.

arun_im4u · Post by **arun_im4u** » Tue Aug 03, 2010 8:35 am

You have mentioned the lookup stage property is auto-partioned.

Instead, can you set the partioning in the reference link to Entire partioning. The input link be auto-partioned. Please review if you get the right results.

css.raghu · Post by **css.raghu** » Tue Aug 03, 2010 9:02 am

hi.
yes the job ran successfully without errors.
the data source data is correct.
please let me know that where i can get that option "inserting partitioning automatically"

priyadarshikunal · Post by **priyadarshikunal** » Tue Aug 03, 2010 9:11 am

The environment variable is called APT_NO_PART_INSERTION, if its value is true then datastage won't insert any partitioning scheme automatically.

Was there any warning in the logs. are there duplicates in reference link? too many things to ask.

Saying every thing is correct except output means nothing is correct unless you identify the problem.

css.raghu · Post by **css.raghu** » Wed Aug 04, 2010 3:13 am

Hi Priyadarshan,
Please see below explanasion on two queries asked by you.

1) I have checked the job parameters in designer and Environment varibles in the Data stage Administrator, it is confirmed that the we have not used APT_NO_PART_INSERTION variable. Now the job is in production.

2) There are no duplicate records in the Referance, But there is duplication in Source itself.

Please let me know if this information is enough to analyse.

Regards,
Raghu Ghantasala.

priyadarshikunal · Post by **priyadarshikunal** » Thu Aug 05, 2010 7:30 am

Duplicates in stream link will not create any problem, it only matters when duplicates are in reference link.

I would start checking the keys specified for lookup and data of reference link.