how to perform self join

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ksk
Participant
Posts: 21
Joined: Fri Apr 30, 2010 4:55 am

how to perform self join

Post by ksk »

Hi,

How to perform self join using join stage in parallel
is it possible to perform a self join.

Thanks
Magesh_bala
Participant
Posts: 86
Joined: Mon Nov 27, 2006 3:42 am
Location: Wilmington

Re: how to perform self join

Post by Magesh_bala »

Hi,

I think you need to use the same file as right and left. then you can achive the self join in datastage.
ksk
Participant
Posts: 21
Joined: Fri Apr 30, 2010 4:55 am

Re: how to perform self join

Post by ksk »

Magesh_bala wrote:Hi,

I think you need to use the same file as right and left. then you can achive the self join in datastage.

Is it possible do with a single input file ?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, both the Join stage inputs read data from the same file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ksk
Participant
Posts: 21
Joined: Fri Apr 30, 2010 4:55 am

Post by ksk »

ray.wurlod wrote:Yes, both the Join stage inputs read data from the same file. ...
i want to use only one input stage not two same input stage for example
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You can always include something like a copy and stream them into two.
truenorth
Participant
Posts: 139
Joined: Mon Jan 18, 2010 4:59 pm
Location: San Antonio

Post by truenorth »

ksk wrote: i want to use only one input stage not two same input stage for example
But a join requires two input streams.
Todd Ramirez
Sr Consultant, Data Quality
San Antonio TX
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes. And your point is?

Code: Select all

                        +----------------+
                        |                V
      SeqFile  ---->  Copy              Join  -------> 
                        |                ^
                        +----------------+
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
truenorth
Participant
Posts: 139
Joined: Mon Jan 18, 2010 4:59 pm
Location: San Antonio

Post by truenorth »

I understand the OP to mean he only wants one input into a Join stage, which requires two. So to do a self-join, I'm with you...two inputs, but both are the one physical input.
Todd Ramirez
Sr Consultant, Data Quality
San Antonio TX
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, they stated "one input stage" not one input into the Join, which (as noted) is not possible. And that solution has been posted. More than once.
-craig

"You can never have too many knives" -- Logan Nine Fingers
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

Post by battaliou »

You're probably better off landing your data into a dataset, and depending on volumes using a lookup or joining the data in a second job. Of course if you insist on using only 1 input stream, you could copy this into 2 sorts based on the join keys and merge your data together. This will have rubbish results if your join key is not unique.
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
Post Reply