Page 1 of 1

Perfomance handling

Posted: Wed Jul 21, 2010 5:41 am
by balu536
Hi All,
I need to start coding for a job where it has 1 source and the source data needs to be looked up against 4 tables (say A,B,C,D).

The source record count is 10 million+ records
Lookup 1 (A) - 13000+ records
Lookup 2 (B) - 15 million+ records
Lookup 3 (C) - 500000+ records
Lookup 4 (D) - 36 million+ records.

I'm not allowed to create any indexes on tables.

Please help me on how to handle such a huge data with out affecting the perfomance.

Posted: Wed Jul 21, 2010 5:53 am
by chulett
Have you coded anything? Determined if you actually have a 'peformance' problem yet? Worry about indexes later when you have the ability to properly make a case.

Posted: Wed Jul 21, 2010 5:58 am
by ray.wurlod
Most database tables have an index on the primary key. If this is the case then sparse lookups will perform reasonably well if the database server is on the same machine as the DataStage server.

Posted: Wed Jul 21, 2010 6:05 am
by rumu
ray.wurlod wrote:Most database tables have an index on the primary key. If this is the case then sparse lookups will perform reasonably well if the database server is on the same machine as the DataStage server. ...
Hi Ray,
Please correct if I am wrong...

Is sparse lookup applicable in this case ,theritically a sparse lookup is applicable if source : lookup is 1:100 typically ,ie if source is reasonably low,in this example two look up contains few thousands records while source is 10 m.

is not join stage a option here ?

Posted: Wed Jul 21, 2010 6:06 am
by balu536
Hi Chulett,
I haven't started the coding yet. Just wanted to makesure that i implement all the best standards during the coding phase inorder to avoid perfomance issues while testing the object.

So i posted here for expert suggestions.

Posted: Wed Jul 21, 2010 6:07 am
by ray.wurlod
Everything is an option. Don't be tied down by what are only guidelines. Be prepared to think about the actual mechanism to make these decisions. For example, if the OP has vast amounts of memory available, then a normal lookup may even be possible. 15 million short rows is not such a great burden.

Posted: Wed Jul 21, 2010 6:11 am
by rumu
ray.wurlod wrote:Everything is an option. Don't be tied down by what are only guidelines. Be prepared to think about the actual mechanism to make these decisions. For example, if the OP has vast amounts of memory av ...
Still,Sparse lookup is expensive since it executes the SQL qry for each incoming row....

Posted: Wed Jul 21, 2010 6:12 am
by rumu
ray.wurlod wrote:Everything is an option. Don't be tied down by what are only guidelines. Be prepared to think about the actual mechanism to make these decisions. For example, if the OP has vast amounts of memory av ...
Still,Sparse lookup is expensive since it executes the SQL qry for each incoming row....

Posted: Wed Jul 21, 2010 6:17 am
by Sainath.Srinivasan
If everything is on the same db or schema, I will write an SQL to join and extract only the results. That is assuming that all joins are on indexes.

For non-indexed columns, it will be based on the extraction factors.

Posted: Wed Jul 21, 2010 3:11 pm
by ray.wurlod
rumu wrote:Still,Sparse lookup is expensive since it executes the SQL qry for each incoming row....
That depends on your Array Size setting.