Sparse Lookup against DB2 ignores WHERE clause?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SettValleyConsulting
Premium Member
Premium Member
Posts: 72
Joined: Thu Sep 04, 2003 5:01 am
Location: UK & Europe

Sparse Lookup against DB2 ignores WHERE clause?

Post by SettValleyConsulting »

I am using a Sparse lookup to return some values from a Customer History table in DB2. The key is a Customer Id, and there is one row on the table per Customer Id, per month.

I use Customer Id as my lookup key, set read type to TABLE and constrain the month using a WHERE clause ....

Instead of returning one row per Customer, I get as many rows as there are months eg if there are entries for a Customer for 6 months every input row generates 6 output rows :(

As an experiment I tried putting the Customer Id in the WHERE clause ie WHERE Customer_ID = 480397, expecting to get just 6 rows returned. Instead, I got exactly the same number of rows output as if there was no WHERE clause.

So it looks to me as if the DB2 EE plugin ignores the WHERE clause in a table read if the lookup type is Sparse.

Incidentally, I have tried User-Defined SQL as the Read Type for Sparse lookup and the job generally hangs ...

Anyone else got a 'conditional' Sparse lookup to work against DB2?
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Post by sanjay »

Hi

with oracle its working fine .

Thanks
Sanjay
SettValleyConsulting
Premium Member
Premium Member
Posts: 72
Joined: Thu Sep 04, 2003 5:01 am
Location: UK & Europe

Post by SettValleyConsulting »

What did you put in your WHERE clause ... do you include the WHERE statement or just the condition itself

eg do you enter "WHERE HIST_DT > '2005-01-01'"

or just "HIST_DT > '2005-01-01'"
sudarshan
Participant
Posts: 11
Joined: Fri Jun 17, 2005 7:41 am

Post by sudarshan »

Well,

why don't u try with putting drop option for lookup failure ?
I believe it is set to continue now.

Sudarshan
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Sparse lookups behave quite differently and can be trickier to setup. In your lookup stage you will notice you no longer need to drag the primary key fields onto the lookup key fields, there is no linking displayed, with a sparse lookup the linking is done automatically. It links (where clause joins) according to the key fields with matching names. Make sure Customer Id is named identically in both the input and reference links and is set to be the only key field in both and your lookup should work.

If you are filtering on month you may find it runs much faster as a normal lookup, the job will load into the lookup dataset just those customers for the filtered month in one select statement instead of running a select for each individual customer that you get with the sparse lookup.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What Vincent said also applies to performing a lookup against a Lookup File Set stage. I acknowledge that this is not relevant to the original post, but it's a "gotcha" in both situations.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SettValleyConsulting
Premium Member
Premium Member
Posts: 72
Joined: Thu Sep 04, 2003 5:01 am
Location: UK & Europe

Post by SettValleyConsulting »

Make sure Customer Id is named identically in both the input and reference links and is set to be the only key field in both and your lookup should work.


I can confirm that all columns are named consistently and the lookup 'works', in fact it works too well, returning more rows than we want. (How do you switch off multi-line results?)'.

What is this about only key field? Does Sparse not support multiple key columns? I did not mention in my original post that my Key is in fact Cluster ID + Customer Id, as I did not think it relevant. Shoulda known better :roll:

I now have user-defined SQL working with Sparse, the point, wierldy enough, seems to be to include the key in your WHERE clause, eg ..

SELECT (some columns)
FROM CDM_IP_HIST
WHERE (IP_CLUS_ID = ORCHESTRATE.IP_CLUS_ID AND IP_ID = ORCHESTRATE.IP_ID)
AND HIST_DT <= '#PROCESS_DT_STR_PARM#' And HIST_END_DT >= '#PROCESS_DT_STR_PARM#'

Ascential Support have confirmed that thay have 'reproduced my issue' and then the message went on to solve an entirely different problem. I may post their answer later, if only for its comedy value ..... :wink:
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

At a guess I would say that when you don't include your own WHERE clause then DataStage builds it for you using the key columns. If you do include a WHERE clause you effectively overwrite whatever the sparse join was going to generate. So your custom WHERE clause that included both the filter and the key fields worked and you clause that included just the filter did not.
SettValleyConsulting
Premium Member
Premium Member
Posts: 72
Joined: Thu Sep 04, 2003 5:01 am
Location: UK & Europe

Post by SettValleyConsulting »

If you do include a WHERE clause you effectively overwrite whatever the sparse join was going to generate

That's basically the conclusion I have come to, further proof that whoever designed datastage does not inhabit the same planet as the rest of us.

Seems that if you want to constrain a sparse db2 lookup on a non-key column (eg a date range as in this case) you choices are either to use User-defined SQL and code the key lookup yourself or as you say, use a constrained Normal lookup to preprocess the reference data to only include the rows you require.

The second option still leaves us with a vastly 1-many relationship in this case - usually a few thou rows in the input stream vs a lookup of several million customers :cry: I thought this was precisely the situation where use of Sparse Lookup was recommended?

Now who at Ascential do I bill for the time spent correcting this undocumented feature?











-
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Go for the top. Sue the company. Oh, they don't exist any more. Shame.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SettValleyConsulting
Premium Member
Premium Member
Posts: 72
Joined: Thu Sep 04, 2003 5:01 am
Location: UK & Europe

Post by SettValleyConsulting »

Cunning.

Mind you if everyone were to sue IBM for all the errors and omissions in Datatage documentation, the would probably be on course for the biggest corporate loss in US History.

Again.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Go to the Vegas conference and heckle during the opening session.
Post Reply