Db2 sparse lookup and DPF

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Chuah
Participant
Posts: 46
Joined: Thu May 18, 2006 9:13 pm
Location: Melbourne

Db2 sparse lookup and DPF

Post by Chuah »

Hi ,

We have a Sparse DB lookup in our PX job but the table will potentially grow to 30 billion records. Currently we are in development mode.
This table is being looked up in all our jobs.

In dev, the db2 instance does not have DPF installed. Question is will the performance of the lookup actually benefit from DPF or is it enough to just have an index on the table ?

Please advise as I don't have much experience in DB2.

Rgds,
Chin
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

Another thing to keep in mind before considering indexes (only ) is the fact that the indexes are only used if the number of qualifying rows for a query are a small percentage (i think 20%) of the total cardinality of the table. Otherwise DB2 ends up doing a table scan any ways.

So please take into account this factor as well.
Nageshsunkoji
Participant
Posts: 222
Joined: Tue Aug 30, 2005 2:07 am
Location: pune
Contact:

Post by Nageshsunkoji »

Hi,

As per Parallel Job Advance guid, it is suggested that, don't go for sparse lookup if your reference data is more. In your case, its 30 billion records. There will be a effect on performance if you use a sparse lookup.
NageshSunkoji

If you know anything SHARE it.............
If you Don't know anything LEARN it...............
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

You can use the sparse look up provided your main stream record count is very less compared to the Reference.

In a Sparse look up, an SQL query is generated for each mainstream input row to find a match in the reference. So if you have 30 million records in the reference, but only 10 records coming from the mainstream - it makes better sense to use the sparse look up.

But if the mainstream record count is considerable, then it would be better to proceed with a Join stage or a Merge stage considering the number of reference records in this case.

The Bird.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

aakashahuja wrote:Another thing to keep in mind before considering indexes (only ) is the fact that the indexes are only used if the number of qualifying rows for a query are a small percentage (i think 20%) of the total cardinality of the table. Otherwise DB2 ends up doing a table scan any ways.

So please take into account this factor as well.
Good thought, but in a sparse lookup situation the number of qualifying rows ought to be one (for a key-based lookup), and "not many" (for a lookup based on a secondary key). Otherwise you're getting into Cartesian product territory, and better hope you're being paid by the hour!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply