join conditions on target columns

rcil · Post by **rcil** » Tue Jun 15, 2004 10:22 pm

I am doing an extract job in datastage which extracts data from a database and loads it into a flat file, the logic invovled on two of the new columns on the target is

select
p.s1,p.s2
from product p,item t
where p.s3=t.t1 and t.date1>='2000-10-10'

if rowsReturned= 1 then s1,s2
if rowsReturned>1 then order by s1,s2 and take the first row

if rowsReturned=0
then

select
p1.s1,p2.s2
from product1 p1, item t
where p1.r1=t.t1 and t.date1>='2000-10-10'

(where Product1 is a different table but got the same column names and item is the same table above)

if rowsReturned= 1 then s1,s2 (of product1)
if rowsReturned>1 then order by s1,s2 and take the first row
if rowsReturned=0 then

select x1,x2 from product3,item t where t.date1>='2000-10-10'

Those two columns in my extract should be populated with these two value. In the main extract I deal with different table (not p1,p2,p3). Is it possible to do it Datastage. HOpe you guys will help me with detail information.

thanks in advance
rcil

cnguyen · Post by **cnguyen** » Tue Jun 15, 2004 10:43 pm

One option is to use a stored-procedure in an ODBC stage for this complex query.

rcil · Post by **rcil** » Tue Jun 15, 2004 10:56 pm

Is it possible to do it DataStage either by using hash lookups or some other way to solve this kind of problem. My input is TD Stage.

thanks
rcil

chulett · Post by **chulett** » Wed Jun 16, 2004 6:51 am

I would think you should be able to build three hash files and check them in order to get the data you need. Both of your conditions, where you either get one row or more than one row, can be satisfied by the hash by taking advantage of its 'destructive overwrite' behaviour - bring the keys in in descending order so that the 'first' row in the group is written to the hash file last and is thus the sole survivor.

Then simply hook to all three hashes and check them in the order you've indicated.

ray.wurlod · Post by **ray.wurlod** » Wed Jun 16, 2004 4:01 pm

A couple of quick thoughts.

Could this not be done with a UNION?

To get the first in an ordered by, you can use MIN function in the SQL.

vmcburney · Post by **vmcburney** » Wed Jun 16, 2004 6:35 pm

MIN and MAX functions have a way of creating surprising results when used against two fields. If two records show:
Item S1 S2
xx 3 8
xx 5 2

Then the MIN function on an aggregation creates the fictional row xx, 3, 2 when it should be using xx, 3, 8. This makes it difficult to find the matching product1 and product3 rows in a SQL statement. If someone knows a way around this please reply.

I think it's a good idea to put product1 and product3 into aggregated hash files.

Put a filter on the input stage or a constraint on the transformer for t.date1>='2000-10-10'.

From memory DataStage aggregation stage does not have a FIRST ROW function so you may be better off using an order by stage to order it by S1, S2 DESC. In a transformer use Stage Variables to save the current S1 and S2 values and a count field for that S1, S2 combination. Reset the rowcount to 1 for a new S1, S2 combination.

Output S1, S2 and rowcount to the hash file where S1 and S2 are the key. Rows with the same key combination will overwrite and update the rowcount as they go.

Now select your Product table:
select
p.s1,p.s2, count(*) rowcount
from product p,item t
where p.s3=t.t1 and t.date1>='2000-10-10'
group by p.s1, p.s2

In a transformer join product hash file to product1 hash file and the product3 hash file.
If product.rowcount is 1 then output product fields.
If product.rowcount is greater then 1 then
If product1.rowcount is 0 or 1 then output product1 fields
If product1.rowcount is greater then 1 then
Output Product3 fields
If no product3 fields are found then
Output a reject record.

You can use stage variables to keep track of flags such as product1.rowcount>1 for use in multiple constraints and derivations.