Perfomance handling

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Perfomance handling

Post by balu536 »

Hi All,
I need to start coding for a job where it has 1 source and the source data needs to be looked up against 4 tables (say A,B,C,D).

The source record count is 10 million+ records
Lookup 1 (A) - 13000+ records
Lookup 2 (B) - 15 million+ records
Lookup 3 (C) - 500000+ records
Lookup 4 (D) - 36 million+ records.

I'm not allowed to create any indexes on tables.

Please help me on how to handle such a huge data with out affecting the perfomance.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Have you coded anything? Determined if you actually have a 'peformance' problem yet? Worry about indexes later when you have the ability to properly make a case.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Most database tables have an index on the primary key. If this is the case then sparse lookups will perform reasonably well if the database server is on the same machine as the DataStage server.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

ray.wurlod wrote:Most database tables have an index on the primary key. If this is the case then sparse lookups will perform reasonably well if the database server is on the same machine as the DataStage server. ...
Hi Ray,
Please correct if I am wrong...

Is sparse lookup applicable in this case ,theritically a sparse lookup is applicable if source : lookup is 1:100 typically ,ie if source is reasonably low,in this example two look up contains few thousands records while source is 10 m.

is not join stage a option here ?
Rumu
IT Consultant
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Post by balu536 »

Hi Chulett,
I haven't started the coding yet. Just wanted to makesure that i implement all the best standards during the coding phase inorder to avoid perfomance issues while testing the object.

So i posted here for expert suggestions.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Everything is an option. Don't be tied down by what are only guidelines. Be prepared to think about the actual mechanism to make these decisions. For example, if the OP has vast amounts of memory available, then a normal lookup may even be possible. 15 million short rows is not such a great burden.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

ray.wurlod wrote:Everything is an option. Don't be tied down by what are only guidelines. Be prepared to think about the actual mechanism to make these decisions. For example, if the OP has vast amounts of memory av ...
Still,Sparse lookup is expensive since it executes the SQL qry for each incoming row....
Rumu
IT Consultant
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

ray.wurlod wrote:Everything is an option. Don't be tied down by what are only guidelines. Be prepared to think about the actual mechanism to make these decisions. For example, if the OP has vast amounts of memory av ...
Still,Sparse lookup is expensive since it executes the SQL qry for each incoming row....
Last edited by rumu on Wed Jul 21, 2010 6:21 am, edited 2 times in total.
Rumu
IT Consultant
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

If everything is on the same db or schema, I will write an SQL to join and extract only the results. That is assuming that all joins are on indexes.

For non-indexed columns, it will be based on the extraction factors.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

rumu wrote:Still,Sparse lookup is expensive since it executes the SQL qry for each incoming row....
That depends on your Array Size setting.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply