Performing Lookups on Large tables

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Das
Participant
Posts: 87
Joined: Tue Oct 24, 2006 9:58 pm
Location: india

Performing Lookups on Large tables

Post by Das »

Hi ,

I wanted to do a self join with a table which it is having data around 10 crores of rows. ,Which is the best method to do that.

Please suggest..
Thanks in advance..
ajay.prakash03
Participant
Posts: 54
Joined: Mon May 28, 2007 12:41 am
Location: Chennai

hi

Post by ajay.prakash03 »

I think u sud wirte a user defined sql query.
caltog
Premium Member
Premium Member
Posts: 39
Joined: Thu May 22, 2003 7:32 am

Re: Performing Lookups on Large tables

Post by caltog »

Das wrote:Hi ,

I wanted to do a self join with a table which it is having data around 10 crores of rows. ,Which is the best method to do that.

Please suggest..
Thanks in advance..
Sorry, but could you please specify what are 10 crores ? 10 millions rows ? In that case I would check if this is really usefull to read 10mio rows.
Thanks in advance
Abburi
Participant
Posts: 31
Joined: Tue May 29, 2007 12:38 pm

Re: Performing Lookups on Large tables

Post by Abburi »

caltog wrote:
Das wrote:Hi ,

I wanted to do a self join with a table which it is having data around 10 crores of rows. ,Which is the best method to do that.

Please suggest..
Thanks in advance..
Sorry, but could you please specify what are 10 crores ? 10 millions rows ? In that case I would check if this is really usefull to read 10mio rows.
Thanks in advance
10 crores = 100,000,000 rows, before self join make sure that you have indexes on this table or make partition and use the table then only performance will be good.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

You could use a database input stage and a lookup stage to join the table to itself - but you would be reading the 10 crores of data in twice. The user-defined SQL in a database input stage lets you join the table on the database engine and bring the data to your ETL server just once. It performs a single intensive SQL statement rather than millions of small SQL statements which should overall be a lighter load on your database - as long as you have the right indexing.
Post Reply