join conditions on target columns

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rcil
Charter Member
Charter Member
Posts: 70
Joined: Sat Jun 05, 2004 1:37 am

join conditions on target columns

Post by rcil »

I am doing an extract job in datastage which extracts data from a database and loads it into a flat file, the logic invovled on two of the new columns on the target is

select
p.s1,p.s2
from product p,item t
where p.s3=t.t1 and t.date1>='2000-10-10'

if rowsReturned= 1 then s1,s2
if rowsReturned>1 then order by s1,s2 and take the first row

if rowsReturned=0
then

select
p1.s1,p2.s2
from product1 p1, item t
where p1.r1=t.t1 and t.date1>='2000-10-10'

(where Product1 is a different table but got the same column names and item is the same table above)

if rowsReturned= 1 then s1,s2 (of product1)
if rowsReturned>1 then order by s1,s2 and take the first row
if rowsReturned=0 then

select x1,x2 from product3,item t where t.date1>='2000-10-10'

Those two columns in my extract should be populated with these two value. In the main extract I deal with different table (not p1,p2,p3). Is it possible to do it Datastage. HOpe you guys will help me with detail information.

thanks in advance
rcil
cnguyen
Premium Member
Premium Member
Posts: 31
Joined: Fri May 14, 2004 9:27 am

Post by cnguyen »

One option is to use a stored-procedure in an ODBC stage for this complex query.
rcil
Charter Member
Charter Member
Posts: 70
Joined: Sat Jun 05, 2004 1:37 am

Post by rcil »

Is it possible to do it DataStage either by using hash lookups or some other way to solve this kind of problem. My input is TD Stage.

thanks
rcil
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I would think you should be able to build three hash files and check them in order to get the data you need. Both of your conditions, where you either get one row or more than one row, can be satisfied by the hash by taking advantage of its 'destructive overwrite' behaviour - bring the keys in in descending order so that the 'first' row in the group is written to the hash file last and is thus the sole survivor.

Then simply hook to all three hashes and check them in the order you've indicated.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A couple of quick thoughts.

Could this not be done with a UNION?

To get the first in an ordered by, you can use MIN function in the SQL.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

MIN and MAX functions have a way of creating surprising results when used against two fields. If two records show:
Item S1 S2
xx 3 8
xx 5 2

Then the MIN function on an aggregation creates the fictional row xx, 3, 2 when it should be using xx, 3, 8. This makes it difficult to find the matching product1 and product3 rows in a SQL statement. If someone knows a way around this please reply.

I think it's a good idea to put product1 and product3 into aggregated hash files.

Put a filter on the input stage or a constraint on the transformer for t.date1>='2000-10-10'.

From memory DataStage aggregation stage does not have a FIRST ROW function so you may be better off using an order by stage to order it by S1, S2 DESC. In a transformer use Stage Variables to save the current S1 and S2 values and a count field for that S1, S2 combination. Reset the rowcount to 1 for a new S1, S2 combination.

Output S1, S2 and rowcount to the hash file where S1 and S2 are the key. Rows with the same key combination will overwrite and update the rowcount as they go.

Now select your Product table:
select
p.s1,p.s2, count(*) rowcount
from product p,item t
where p.s3=t.t1 and t.date1>='2000-10-10'
group by p.s1, p.s2

In a transformer join product hash file to product1 hash file and the product3 hash file.
If product.rowcount is 1 then output product fields.
If product.rowcount is greater then 1 then
If product1.rowcount is 0 or 1 then output product1 fields
If product1.rowcount is greater then 1 then
Output Product3 fields
If no product3 fields are found then
Output a reject record.

You can use stage variables to keep track of flags such as product1.rowcount>1 for use in multiple constraints and derivations.
Post Reply