Perfomance handling

balu536 · Post by **balu536** » Wed Jul 21, 2010 5:41 am

Hi All,
I need to start coding for a job where it has 1 source and the source data needs to be looked up against 4 tables (say A,B,C,D).

The source record count is 10 million+ records
Lookup 1 (A) - 13000+ records
Lookup 2 (B) - 15 million+ records
Lookup 3 (C) - 500000+ records
Lookup 4 (D) - 36 million+ records.

I'm not allowed to create any indexes on tables.

Please help me on how to handle such a huge data with out affecting the perfomance.

chulett · Post by **chulett** » Wed Jul 21, 2010 5:53 am

Have you coded anything? Determined if you actually have a 'peformance' problem yet? Worry about indexes later when you have the ability to properly make a case.

ray.wurlod · Post by **ray.wurlod** » Wed Jul 21, 2010 5:58 am

Most database tables have an index on the primary key. If this is the case then sparse lookups will perform reasonably well if the database server is on the same machine as the DataStage server.

rumu · Post by **rumu** » Wed Jul 21, 2010 6:05 am

ray.wurlod wrote:Most database tables have an index on the primary key. If this is the case then sparse lookups will perform reasonably well if the database server is on the same machine as the DataStage server. ...

Hi Ray,
Please correct if I am wrong...

Is sparse lookup applicable in this case ,theritically a sparse lookup is applicable if source : lookup is 1:100 typically ,ie if source is reasonably low,in this example two look up contains few thousands records while source is 10 m.

is not join stage a option here ?

balu536 · Post by **balu536** » Wed Jul 21, 2010 6:06 am

Hi Chulett,
I haven't started the coding yet. Just wanted to makesure that i implement all the best standards during the coding phase inorder to avoid perfomance issues while testing the object.

So i posted here for expert suggestions.

ray.wurlod · Post by **ray.wurlod** » Wed Jul 21, 2010 6:07 am

Everything is an option. Don't be tied down by what are only guidelines. Be prepared to think about the actual mechanism to make these decisions. For example, if the OP has vast amounts of memory available, then a normal lookup may even be possible. 15 million short rows is not such a great burden.

rumu · Post by **rumu** » Wed Jul 21, 2010 6:11 am

ray.wurlod wrote:Everything is an option. Don't be tied down by what are only guidelines. Be prepared to think about the actual mechanism to make these decisions. For example, if the OP has vast amounts of memory av ...

Still,Sparse lookup is expensive since it executes the SQL qry for each incoming row....

rumu · Post by **rumu** » Wed Jul 21, 2010 6:12 am

ray.wurlod wrote:Everything is an option. Don't be tied down by what are only guidelines. Be prepared to think about the actual mechanism to make these decisions. For example, if the OP has vast amounts of memory av ...

Still,Sparse lookup is expensive since it executes the SQL qry for each incoming row....

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed Jul 21, 2010 6:17 am

If everything is on the same db or schema, I will write an SQL to join and extract only the results. That is assuming that all joins are on indexes.

For non-indexed columns, it will be based on the extraction factors.

ray.wurlod · Post by **ray.wurlod** » Wed Jul 21, 2010 3:11 pm

rumu wrote:Still,Sparse lookup is expensive since it executes the SQL qry for each incoming row....

That depends on your Array Size setting.