Design Issue Lookup Vs Join

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsinfosys13
Participant
Posts: 9
Joined: Thu Oct 30, 2003 2:58 am

Design Issue Lookup Vs Join

Post by dsinfosys13 »

Hi All,

I have 1 million records in Txn File and 500 million records in a second table T2 in udb database. I need to do a join between Txn and T2.

If I do a join won't it not pull 500 million from the database to ascential memory and do the processing ?.

I also saw some features like 'sparse lookup' in db2 enterprise stage, where we can avoid an "IN MEMORY" lookup for 500 million.

I feel sparse look up may be a better approach for the design. Kindly guide me on this

Thanks and Rgds,
CP
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

Hi CP,

the best performance can never be forseen in advance, mostly there is an try and error process and a time frame. You start with anything. I would say if the time frame is okay I leave it. If I need a higher Performance I have to build the next type and try it.

The question now is what kind of join you want to make?

Inner Join or Lookup you want to make?

If you look after one fitting row in your db-Table for each line of your txt-file, may be the most performant way is to use the db2-table as Lookup.

A lookup in an hashfile is quicker but you have to take into account the time for loading the hashfile.

Wolfgang
neena
Participant
Posts: 90
Joined: Mon Mar 31, 2003 4:32 pm

Re: Design Issue Lookup Vs Join

Post by neena »

Hi
Try to use a user defined Sql in the input stage (Oracle) and tryto use a join commnad over there. This makes ur job more easier..
Hope this works..
Tnks,
Neena
dsinfosys13 wrote:Hi All,

I have 1 million records in Txn File and 500 million records in a second table T2 in udb database. I need to do a join between Txn and T2.

If I do a join won't it not pull 500 million from the database to ascential memory and do the processing ?.

I also saw some features like 'sparse lookup' in db2 enterprise stage, where we can avoid an "IN MEMORY" lookup for 500 million.

I feel sparse look up may be a better approach for the design. Kindly guide me on this

Thanks and Rgds,
CP
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: Design Issue Lookup Vs Join

Post by Teej »

dsinfosys13 wrote:I have 1 million records in Txn File and 500 million records in a second table T2 in udb database. I need to do a join between Txn and T2.
Inner join?
If I do a join won't it not pull 500 million from the database to ascential memory and do the processing ?.
No. It will put the data onto an internal dataset file each link, and then do the join based on the data provided.
I also saw some features like 'sparse lookup' in db2 enterprise stage, where we can avoid an "IN MEMORY" lookup for 500 million.
It depends on whether this may be benefical. Is only 1 million rows on the T2 table matching up with the 1 million rows for input?

Lookup stages are vastly improved for performance on 7.0.1, especially for data your (and my) size.

To be honest with you, it all depends on the data you're using. As the other poster said, it's a trial and error process based on previous knowledge of how things are like. The only way we can really help you is by being able to view the type of data both tables have, what you are trying to do, and why you have to do it.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Post Reply