Db2 sparse lookup and DPF

Chuah · Post by **Chuah** » Mon Nov 27, 2006 12:01 am

Hi ,

We have a Sparse DB lookup in our PX job but the table will potentially grow to 30 billion records. Currently we are in development mode.
This table is being looked up in all our jobs.

In dev, the db2 instance does not have DPF installed. Question is will the performance of the lookup actually benefit from DPF or is it enough to just have an index on the table ?

Please advise as I don't have much experience in DB2.

Rgds,
Chin

aakashahuja · Post by **aakashahuja** » Mon Nov 27, 2006 2:05 am

Another thing to keep in mind before considering indexes (only ) is the fact that the indexes are only used if the number of qualifying rows for a query are a small percentage (i think 20%) of the total cardinality of the table. Otherwise DB2 ends up doing a table scan any ways.

So please take into account this factor as well.

Nageshsunkoji · Post by **Nageshsunkoji** » Mon Nov 27, 2006 3:34 am

Hi,

As per Parallel Job Advance guid, it is suggested that, don't go for sparse lookup if your reference data is more. In your case, its 30 billion records. There will be a effect on performance if you use a sparse lookup.

thebird · Post by **thebird** » Mon Nov 27, 2006 10:38 am

You can use the sparse look up provided your main stream record count is very less compared to the Reference.

In a Sparse look up, an SQL query is generated for each mainstream input row to find a match in the reference. So if you have 30 million records in the reference, but only 10 records coming from the mainstream - it makes better sense to use the sparse look up.

But if the mainstream record count is considerable, then it would be better to proceed with a Join stage or a Merge stage considering the number of reference records in this case.

The Bird.

ray.wurlod · Post by **ray.wurlod** » Mon Nov 27, 2006 2:29 pm

aakashahuja wrote:Another thing to keep in mind before considering indexes (only ) is the fact that the indexes are only used if the number of qualifying rows for a query are a small percentage (i think 20%) of the total cardinality of the table. Otherwise DB2 ends up doing a table scan any ways.

So please take into account this factor as well.

Good thought, but in a sparse lookup situation the number of qualifying rows ought to be one (for a key-based lookup), and "not many" (for a lookup based on a secondary key). Otherwise you're getting into Cartesian product territory, and better hope you're being paid by the hour!