Job design for two way range join

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
stiantok
Participant
Posts: 8
Joined: Tue Sep 09, 2014 8:55 am

Job design for two way range join

Post by stiantok »

Hi!

I am trying to implement a join between two tables with history in Datastage. In sql I would do something like:

Code: Select all

select *
from tableA a 
left outer tableB b 
on a.key = b.key
and (a.valid_from_date between b.valid_from_date and b.valid_to_date
or b.valid_from_date between a.valid_from_date and b.valid_to_date)
and b.valid_from_date >= a.valid_from_date
Is there any neat way of doing this with Datastage components? I also want to make it generic by creating a shared container with RCP.

Thanks!
naveenkumar.ssn
Participant
Posts: 36
Joined: Thu Dec 03, 2009 9:11 pm
Location: Malaysia

Re: Job design for two way range join

Post by naveenkumar.ssn »

Hi
REF(TABLE b)
|
|
SRC(TABLE a) ----- Join Stage(Left Join) --- Transformer ----- Target

In the transformer
--- StageVariable1 --- if a.valid_frm_date >= b.valid_frm_date AND a.valid_frm_date < b.valid_frm_date then @True else @False

---- StageVarialbe 2 -- Similarly for the b.valid_frm_date(same condition as above)

---- StageVariable 3 -- if (b.valid_from_date >= a.valid_from_date) then @True else @False

In the constraint of the transformer ..you have to allow only those records which matches the above ..meaning (StageVariable1 = @True OR StageVarialbe2 = @True) AND StageVariable3 = @TRUE

Regards
Naveen
Naveen Kumar
Datastage Consultant
stiantok
Participant
Posts: 8
Joined: Tue Sep 09, 2014 8:55 am

Post by stiantok »

Thanks for the input Naveen!

Any thoughts on how this design will perform when there is large amounts of data? Won't the first potentially produce a huge amount of data that has to be processed by the Transformer? Say table A has 3 mill rows, and that each of these hits 100 rows in table B, that will leave us with 300 mill rows to be processed by the Transformer...?

Maybe there is a way of constraining the amount of data in the Join processing also?

br
Post Reply