Page 1 of 1

Sparse Lookup against DB2 ignores WHERE clause?

Posted: Wed Nov 02, 2005 6:34 am
by SettValleyConsulting
I am using a Sparse lookup to return some values from a Customer History table in DB2. The key is a Customer Id, and there is one row on the table per Customer Id, per month.

I use Customer Id as my lookup key, set read type to TABLE and constrain the month using a WHERE clause ....

Instead of returning one row per Customer, I get as many rows as there are months eg if there are entries for a Customer for 6 months every input row generates 6 output rows :(

As an experiment I tried putting the Customer Id in the WHERE clause ie WHERE Customer_ID = 480397, expecting to get just 6 rows returned. Instead, I got exactly the same number of rows output as if there was no WHERE clause.

So it looks to me as if the DB2 EE plugin ignores the WHERE clause in a table read if the lookup type is Sparse.

Incidentally, I have tried User-Defined SQL as the Read Type for Sparse lookup and the job generally hangs ...

Anyone else got a 'conditional' Sparse lookup to work against DB2?

Posted: Wed Nov 02, 2005 6:50 am
by sanjay
Hi

with oracle its working fine .

Thanks
Sanjay

Posted: Wed Nov 02, 2005 7:00 am
by SettValleyConsulting
What did you put in your WHERE clause ... do you include the WHERE statement or just the condition itself

eg do you enter "WHERE HIST_DT > '2005-01-01'"

or just "HIST_DT > '2005-01-01'"

Posted: Mon Nov 07, 2005 8:11 am
by sudarshan
Well,

why don't u try with putting drop option for lookup failure ?
I believe it is set to continue now.

Sudarshan

Posted: Mon Nov 07, 2005 5:38 pm
by vmcburney
Sparse lookups behave quite differently and can be trickier to setup. In your lookup stage you will notice you no longer need to drag the primary key fields onto the lookup key fields, there is no linking displayed, with a sparse lookup the linking is done automatically. It links (where clause joins) according to the key fields with matching names. Make sure Customer Id is named identically in both the input and reference links and is set to be the only key field in both and your lookup should work.

If you are filtering on month you may find it runs much faster as a normal lookup, the job will load into the lookup dataset just those customers for the filtered month in one select statement instead of running a select for each individual customer that you get with the sparse lookup.

Posted: Tue Nov 08, 2005 4:17 am
by ray.wurlod
What Vincent said also applies to performing a lookup against a Lookup File Set stage. I acknowledge that this is not relevant to the original post, but it's a "gotcha" in both situations.

Posted: Tue Nov 08, 2005 9:45 am
by SettValleyConsulting
Make sure Customer Id is named identically in both the input and reference links and is set to be the only key field in both and your lookup should work.


I can confirm that all columns are named consistently and the lookup 'works', in fact it works too well, returning more rows than we want. (How do you switch off multi-line results?)'.

What is this about only key field? Does Sparse not support multiple key columns? I did not mention in my original post that my Key is in fact Cluster ID + Customer Id, as I did not think it relevant. Shoulda known better :roll:

I now have user-defined SQL working with Sparse, the point, wierldy enough, seems to be to include the key in your WHERE clause, eg ..

SELECT (some columns)
FROM CDM_IP_HIST
WHERE (IP_CLUS_ID = ORCHESTRATE.IP_CLUS_ID AND IP_ID = ORCHESTRATE.IP_ID)
AND HIST_DT <= '#PROCESS_DT_STR_PARM#' And HIST_END_DT >= '#PROCESS_DT_STR_PARM#'

Ascential Support have confirmed that thay have 'reproduced my issue' and then the message went on to solve an entirely different problem. I may post their answer later, if only for its comedy value ..... :wink:

Posted: Tue Nov 08, 2005 5:20 pm
by vmcburney
At a guess I would say that when you don't include your own WHERE clause then DataStage builds it for you using the key columns. If you do include a WHERE clause you effectively overwrite whatever the sparse join was going to generate. So your custom WHERE clause that included both the filter and the key fields worked and you clause that included just the filter did not.

Posted: Wed Nov 09, 2005 3:15 am
by SettValleyConsulting
If you do include a WHERE clause you effectively overwrite whatever the sparse join was going to generate

That's basically the conclusion I have come to, further proof that whoever designed datastage does not inhabit the same planet as the rest of us.

Seems that if you want to constrain a sparse db2 lookup on a non-key column (eg a date range as in this case) you choices are either to use User-defined SQL and code the key lookup yourself or as you say, use a constrained Normal lookup to preprocess the reference data to only include the rows you require.

The second option still leaves us with a vastly 1-many relationship in this case - usually a few thou rows in the input stream vs a lookup of several million customers :cry: I thought this was precisely the situation where use of Sparse Lookup was recommended?

Now who at Ascential do I bill for the time spent correcting this undocumented feature?











-

Posted: Wed Nov 09, 2005 3:39 am
by ray.wurlod
Go for the top. Sue the company. Oh, they don't exist any more. Shame.

Posted: Wed Nov 09, 2005 4:28 am
by SettValleyConsulting
Cunning.

Mind you if everyone were to sue IBM for all the errors and omissions in Datatage documentation, the would probably be on course for the biggest corporate loss in US History.

Again.

Posted: Wed Nov 09, 2005 4:53 am
by vmcburney
Go to the Vegas conference and heckle during the opening session.