Page 1 of 1
how to perform self join
Posted: Tue May 11, 2010 10:59 pm
by ksk
Hi,
How to perform self join using join stage in parallel
is it possible to perform a self join.
Thanks
Re: how to perform self join
Posted: Wed May 12, 2010 12:00 am
by Magesh_bala
Hi,
I think you need to use the same file as right and left. then you can achive the self join in datastage.
Re: how to perform self join
Posted: Wed May 12, 2010 12:43 am
by ksk
Magesh_bala wrote:Hi,
I think you need to use the same file as right and left. then you can achive the self join in datastage.
Is it possible do with a single input file ?
Posted: Wed May 12, 2010 12:46 am
by ray.wurlod
Yes, both the Join stage inputs read data from the same file.
Posted: Wed May 12, 2010 1:00 am
by ksk
ray.wurlod wrote:Yes, both the Join stage inputs read data from the same file. ...
i want to use only one input stage not two same input stage for example
Posted: Wed May 12, 2010 3:08 am
by Sainath.Srinivasan
You can always include something like a copy and stream them into two.
Posted: Wed May 12, 2010 10:26 pm
by truenorth
ksk wrote:
i want to use only one input stage not two same input stage for example
But a join requires two input streams.
Posted: Wed May 12, 2010 11:11 pm
by ray.wurlod
Yes. And your point is?
Code: Select all
+----------------+
| V
SeqFile ----> Copy Join ------->
| ^
+----------------+
Posted: Thu May 13, 2010 11:24 pm
by truenorth
I understand the OP to mean he only wants one input into a Join stage, which requires two. So to do a self-join, I'm with you...two inputs, but both are the one physical input.
Posted: Fri May 14, 2010 6:30 am
by chulett
No, they stated "one input stage" not one input into the Join, which (as noted) is not possible. And that solution has been posted. More than once.
Posted: Fri May 14, 2010 6:45 am
by battaliou
You're probably better off landing your data into a dataset, and depending on volumes using a lookup or joining the data in a second job. Of course if you insist on using only 1 input stream, you could copy this into 2 sorts based on the join keys and merge your data together. This will have rubbish results if your join key is not unique.