Page 1 of 1

Job design for two way range join

Posted: Mon Dec 21, 2015 5:51 am
by stiantok
Hi!

I am trying to implement a join between two tables with history in Datastage. In sql I would do something like:

Code: Select all

select *
from tableA a 
left outer tableB b 
on a.key = b.key
and (a.valid_from_date between b.valid_from_date and b.valid_to_date
or b.valid_from_date between a.valid_from_date and b.valid_to_date)
and b.valid_from_date >= a.valid_from_date
Is there any neat way of doing this with Datastage components? I also want to make it generic by creating a shared container with RCP.

Thanks!

Re: Job design for two way range join

Posted: Mon Dec 21, 2015 10:30 pm
by naveenkumar.ssn
Hi
REF(TABLE b)
|
|
SRC(TABLE a) ----- Join Stage(Left Join) --- Transformer ----- Target

In the transformer
--- StageVariable1 --- if a.valid_frm_date >= b.valid_frm_date AND a.valid_frm_date < b.valid_frm_date then @True else @False

---- StageVarialbe 2 -- Similarly for the b.valid_frm_date(same condition as above)

---- StageVariable 3 -- if (b.valid_from_date >= a.valid_from_date) then @True else @False

In the constraint of the transformer ..you have to allow only those records which matches the above ..meaning (StageVariable1 = @True OR StageVarialbe2 = @True) AND StageVariable3 = @TRUE

Regards
Naveen

Posted: Tue Jan 12, 2016 5:51 am
by stiantok
Thanks for the input Naveen!

Any thoughts on how this design will perform when there is large amounts of data? Won't the first potentially produce a huge amount of data that has to be processed by the Transformer? Say table A has 3 mill rows, and that each of these hits 100 rows in table B, that will leave us with 300 mill rows to be processed by the Transformer...?

Maybe there is a way of constraining the amount of data in the Join processing also?

br