Page 1 of 1

Joining more than 2 tables in a server job

Posted: Mon Nov 01, 2010 2:59 am
by arpitchopra
Hi everyone,
I am a complete novice (introduced to the world of DataStage, rather ETL just a couple of months ago). I am stuck on a problem and will be glad if any of you could help me out with it.

I have 4 tables and I want to perform a join on them in the following manner

select a.<column list>, b.<column list> from
(select P.x, Q.y from P,Q where P.x = Q.y) a,
(select R.z, S.w from R,S where R.z = S.w) b
where a.x = b.z


I was trying to implement this as

Code: Select all

Orcl ---->Transformer1 --------------->
Orcl----->

                                       TRANSFORMER 3--------> Orcl

Orcl----->Transformer2---------------->
Orcl----->
But I found out (much to my dismay and complete PANIC) that this doesn't work. Since the 2nd link into transformer 3 will be a reference link and reference link can't be an output from transformer 2.

Can anyone help me out ?

PS: The query that I require is not as simple as above. It involves much more joins. But I figured I'll be able to get it done once I get my concepts right.

PPS: In case the diagram is not clear: The o/p of Transformer 1 and 2 goes in as i/p to Transformer 3.


Regards

Posted: Mon Nov 01, 2010 7:17 am
by chulett
OK, let's start with a simple solution and then work from there. Why not continue to do the join in SQL, all at once in a single stage? While you can bring in each table individually and then let the job do all the work, there's no reason to go to all that trouble and complexity unless you're being asked to do exactly that for some reason. Especially if what you posted is a simplified version of what you'd actually end up needing to do. :?

So, can you not put all of the columns resulting from the query into the stage and then use your original SQL as 'user-defined SQL'?

Posted: Mon Nov 01, 2010 8:09 am
by ds_george
Store the outputs of the transformer1 and transformer2 in hashed files and join them in the transformer.

Posted: Mon Nov 01, 2010 8:07 pm
by arpitchopra
Chulett :lol:
Utmost respect for you, first of all !
Your intelligence, depth of knowledge, rhetoric attitude and complete lack of modesty makes this an extremely fun forum :wink:
Total respect for what you do, though :)

Anyways, I did finally do the thing that you have suggested. However, I'm concerned about the system's future maintenance. What if someone else comes to work/change this mapping (Considering the fact that I'll hand it over to the client in a couple of weeks) ? Won't it be simpler for him to understand the mapping instead of going through a long query ?

ds_george: I thought of that solution as well. But what if I can't afford the hashed files (size of data being too large or if I'm already storing a lot of hashed files in my project) ? What is the limit by the way ? In terms of space that can be used up by such temp files in a project ?

Posted: Mon Nov 01, 2010 8:14 pm
by arpitchopra
Oh and another question !
I was going to put a separate topic for it but then realized that this question falls under the same subject.

Consider this join

select A.*, B.*, C.*, D.*
from A,B,C,D
where
A.p = B.q and
B.q = C.r and
C.s = D.t


How do I perform this join using a transformer ? Since there can only be 1 stream link, it will only allow me to link column B.q and C.r with A.p
What do I do with D.t ??

@chulett: Again the same thing about maintenance and query not being as simple as it looks.

Posted: Mon Nov 01, 2010 10:10 pm
by chulett
You load B, C and D into individual hashed files, stream in A and then join to each via a reference lookup one after the other. You can technically do this all in one transformer but beginners should stick with three transformers with one lookup apiece.

Add the columns you need from each lookup into the stream as you go and constrain each transformer to only pass records where the reference lookup succeeds.

Posted: Tue Nov 02, 2010 12:00 am
by arpitchopra
Thanks !!
:)

Posted: Mon Feb 28, 2011 8:53 am
by thurmy34
Hi All
I'm sorry to jump in thi topic but i'would like to know how you do that:
You can technically do this all in one transformer but beginners should stick with three transformers with one lookup apiece.
I'm not a beginner but i use severals transformers to add the missing keys.

Thank you

Posted: Mon Feb 28, 2011 10:22 am
by chulett
First, you make sure the lookups happen in the proper order. Secondly, you pass the output of one lookup to the input key(s) of the next one by typing their names in manually as there is no way to do this via the GUI. Thirdly, you live with the fact that those keys will show in red as if there was something wrong with them when in fact (assuming you've spelled everything correctly) it should work fine. Lastly, I would suggest annotating what exactly you are doing so no-one is suprised down the line.

Posted: Tue Mar 01, 2011 1:27 am
by thurmy34
Thank You
I'll try that.

Posted: Tue Mar 01, 2011 4:43 am
by major
Wonderful chulett , I never knew this approach.