Reg: Reference data size for lookup stage.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
css.raghu
Participant
Posts: 18
Joined: Thu Jan 28, 2010 9:34 pm

Reg: Reference data size for lookup stage.

Post by css.raghu »

Hi,

I have a simple scenario as below.

one source sequential file. (20042 rows)
one oracle table as referance. (4067814 rows)
i have used lookup stage to lookup the data.
the lookup stage property is auto partitioned.
client is complaining that the target data is not correct for few records.
what i feel is the size problem of lookup stage.
Please let me know whether the lookup can handle these many rows as referance.

Regards,
Raghu Ghantasala.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Can you post the exact error message? The LookUp Stage can definitely handle 4 million records.
Kris

Where's the "Any" key?-Homer Simpson
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Number of records doesn't matter if your jobs is running without error.

lookup will partion reference link data to entire unless "inserting partitioning automatically" is disabled. If data is not correct for few records, i would first check data and try to find a link between wrong values.

Seems to me like Data/Design problem rather than limitation of Datastage / hardware.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
arun_im4u
Premium Member
Premium Member
Posts: 104
Joined: Mon Nov 08, 2004 8:42 am

Post by arun_im4u »

You have mentioned the lookup stage property is auto-partioned.

Instead, can you set the partioning in the reference link to Entire partioning. The input link be auto-partioned. Please review if you get the right results.
css.raghu
Participant
Posts: 18
Joined: Thu Jan 28, 2010 9:34 pm

Post by css.raghu »

hi.
yes the job ran successfully without errors.
the data source data is correct.
please let me know that where i can get that option "inserting partitioning automatically"
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

The environment variable is called APT_NO_PART_INSERTION, if its value is true then datastage won't insert any partitioning scheme automatically.

Was there any warning in the logs. are there duplicates in reference link? too many things to ask.

Saying every thing is correct except output means nothing is correct unless you identify the problem.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
css.raghu
Participant
Posts: 18
Joined: Thu Jan 28, 2010 9:34 pm

Post by css.raghu »

Hi Priyadarshan,
Please see below explanasion on two queries asked by you.

1) I have checked the job parameters in designer and Environment varibles in the Data stage Administrator, it is confirmed that the we have not used APT_NO_PART_INSERTION variable. Now the job is in production.

2) There are no duplicate records in the Referance, But there is duplication in Source itself.

Please let me know if this information is enough to analyse.

Regards,
Raghu Ghantasala.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Duplicates in stream link will not create any problem, it only matters when duplicates are in reference link.

I would start checking the keys specified for lookup and data of reference link.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
Post Reply