Page 1 of 1

Reg: Reference data size for lookup stage.

Posted: Tue Aug 03, 2010 8:03 am
by css.raghu
Hi,

I have a simple scenario as below.

one source sequential file. (20042 rows)
one oracle table as referance. (4067814 rows)
i have used lookup stage to lookup the data.
the lookup stage property is auto partitioned.
client is complaining that the target data is not correct for few records.
what i feel is the size problem of lookup stage.
Please let me know whether the lookup can handle these many rows as referance.

Regards,
Raghu Ghantasala.

Posted: Tue Aug 03, 2010 8:24 am
by kris007
Can you post the exact error message? The LookUp Stage can definitely handle 4 million records.

Posted: Tue Aug 03, 2010 8:33 am
by priyadarshikunal
Number of records doesn't matter if your jobs is running without error.

lookup will partion reference link data to entire unless "inserting partitioning automatically" is disabled. If data is not correct for few records, i would first check data and try to find a link between wrong values.

Seems to me like Data/Design problem rather than limitation of Datastage / hardware.

Posted: Tue Aug 03, 2010 8:35 am
by arun_im4u
You have mentioned the lookup stage property is auto-partioned.

Instead, can you set the partioning in the reference link to Entire partioning. The input link be auto-partioned. Please review if you get the right results.

Posted: Tue Aug 03, 2010 9:02 am
by css.raghu
hi.
yes the job ran successfully without errors.
the data source data is correct.
please let me know that where i can get that option "inserting partitioning automatically"

Posted: Tue Aug 03, 2010 9:11 am
by priyadarshikunal
The environment variable is called APT_NO_PART_INSERTION, if its value is true then datastage won't insert any partitioning scheme automatically.

Was there any warning in the logs. are there duplicates in reference link? too many things to ask.

Saying every thing is correct except output means nothing is correct unless you identify the problem.

Posted: Wed Aug 04, 2010 3:13 am
by css.raghu
Hi Priyadarshan,
Please see below explanasion on two queries asked by you.

1) I have checked the job parameters in designer and Environment varibles in the Data stage Administrator, it is confirmed that the we have not used APT_NO_PART_INSERTION variable. Now the job is in production.

2) There are no duplicate records in the Referance, But there is duplication in Source itself.

Please let me know if this information is enough to analyse.

Regards,
Raghu Ghantasala.

Posted: Thu Aug 05, 2010 7:30 am
by priyadarshikunal
Duplicates in stream link will not create any problem, it only matters when duplicates are in reference link.

I would start checking the keys specified for lookup and data of reference link.