Job design for two way range join

stiantok · Post by **stiantok** » Mon Dec 21, 2015 5:51 am

Hi!

I am trying to implement a join between two tables with history in Datastage. In sql I would do something like:

select *
from tableA a 
left outer tableB b 
on a.key = b.key
and (a.valid_from_date between b.valid_from_date and b.valid_to_date
or b.valid_from_date between a.valid_from_date and b.valid_to_date)
and b.valid_from_date >= a.valid_from_date

Is there any neat way of doing this with Datastage components? I also want to make it generic by creating a shared container with RCP.

Thanks!

naveenkumar.ssn · Post by **naveenkumar.ssn** » Mon Dec 21, 2015 10:30 pm

Hi
REF(TABLE b)
|
|
SRC(TABLE a) ----- Join Stage(Left Join) --- Transformer ----- Target

In the transformer
--- StageVariable1 --- if a.valid_frm_date >= b.valid_frm_date AND a.valid_frm_date < b.valid_frm_date then @True else @False

---- StageVarialbe 2 -- Similarly for the b.valid_frm_date(same condition as above)

---- StageVariable 3 -- if (b.valid_from_date >= a.valid_from_date) then @True else @False

In the constraint of the transformer ..you have to allow only those records which matches the above ..meaning (StageVariable1 = @True OR StageVarialbe2 = @True) AND StageVariable3 = @TRUE

Regards
Naveen

stiantok · Post by **stiantok** » Tue Jan 12, 2016 5:51 am

Thanks for the input Naveen!

Any thoughts on how this design will perform when there is large amounts of data? Won't the first potentially produce a huge amount of data that has to be processed by the Transformer? Say table A has 3 mill rows, and that each of these hits 100 rows in table B, that will leave us with 300 mill rows to be processed by the Transformer...?

Maybe there is a way of constraining the amount of data in the Join processing also?

br

DSXchange

Job design for two way range join

Job design for two way range join

Re: Job design for two way range join