Page 1 of 1

how to perform self join

Posted: Tue May 11, 2010 10:59 pm
by ksk
Hi,

How to perform self join using join stage in parallel
is it possible to perform a self join.

Thanks

Re: how to perform self join

Posted: Wed May 12, 2010 12:00 am
by Magesh_bala
Hi,

I think you need to use the same file as right and left. then you can achive the self join in datastage.

Re: how to perform self join

Posted: Wed May 12, 2010 12:43 am
by ksk
Magesh_bala wrote:Hi,

I think you need to use the same file as right and left. then you can achive the self join in datastage.

Is it possible do with a single input file ?

Posted: Wed May 12, 2010 12:46 am
by ray.wurlod
Yes, both the Join stage inputs read data from the same file.

Posted: Wed May 12, 2010 1:00 am
by ksk
ray.wurlod wrote:Yes, both the Join stage inputs read data from the same file. ...
i want to use only one input stage not two same input stage for example

Posted: Wed May 12, 2010 3:08 am
by Sainath.Srinivasan
You can always include something like a copy and stream them into two.

Posted: Wed May 12, 2010 10:26 pm
by truenorth
ksk wrote: i want to use only one input stage not two same input stage for example
But a join requires two input streams.

Posted: Wed May 12, 2010 11:11 pm
by ray.wurlod
Yes. And your point is?

Code: Select all

                        +----------------+
                        |                V
      SeqFile  ---->  Copy              Join  -------> 
                        |                ^
                        +----------------+

Posted: Thu May 13, 2010 11:24 pm
by truenorth
I understand the OP to mean he only wants one input into a Join stage, which requires two. So to do a self-join, I'm with you...two inputs, but both are the one physical input.

Posted: Fri May 14, 2010 6:30 am
by chulett
No, they stated "one input stage" not one input into the Join, which (as noted) is not possible. And that solution has been posted. More than once.

Posted: Fri May 14, 2010 6:45 am
by battaliou
You're probably better off landing your data into a dataset, and depending on volumes using a lookup or joining the data in a second job. Of course if you insist on using only 1 input stream, you could copy this into 2 sorts based on the join keys and merge your data together. This will have rubbish results if your join key is not unique.