Problem in joining 2 tables

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Amarpreet
Participant
Posts: 20
Joined: Tue Apr 04, 2006 11:45 pm
Location: India
Contact:

Problem in joining 2 tables

Post by Amarpreet »

I have 2 tables to join, in which the column names are not same and the condition is not simple. I have a condition like this:
where table1.a>table2.b and table1.a<=table2.c(in between condition). How can I do this in parallel job.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

Are the two tables in the same database? If so join in the database first.

If they are not it is more difficult. Are those the only two key fields or are there more key fields that you could join on but which would return multiple records for the same keys? If so you could join on the other keys passing through table1.a, table2.b and table3.c, which will give you X-product and then use a filter stage to restrict output to the records where your condition is true.
Regards,

Nick.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Rename the columns in an upstream Copy or Modify stage so that the key column names do match.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
seanc217
Premium Member
Premium Member
Posts: 188
Joined: Thu Sep 15, 2005 9:22 am

Post by seanc217 »

You could also use column aliases to rename the column utilizing custom sql. In an Oracle DB you would do it as such select col1 as column1....

HTH
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Re: Problem in joining 2 tables

Post by sud »

Amarpreet wrote:table1.a>table2.b and table1.a<=table2.c(in between condition)
You should not need a join condition at all and hence you won't need renaming columns. Achieve a cross join of the two tables and put a filter where you can specify the condition you mentioned.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Or do a lookup to the table2 for a dummy column with a value say '1'. You can check the lookup condition as give by you table1.a>table2.b and table1.a<=table2.c.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

kumar_s wrote:Or do a lookup to the table2 for a dummy column with a value say '1'. You can check the lookup condition as give by you table1.a>table2.b and table1.a<=table2.c.
In this case if there is a possibility of more than one record satisfying the condition, please do not forget to set the link name from which multiple records can come (u can set this property by double clicking the lookup condition)
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Which is very similar ot Cartesian product by using Join stage.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Cartesian product "solution" proposed by a couple of posters will be very long-running if both tables have large row counts.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

If the tables are not in the same database load them both into temp staging tables in the same database and perform join in there.
Regards,

Nick.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

bondcaliwood wrote:If the tables are not in the same database load them both into temp staging tables in the same database and perform join in there.
Still it will run for long if row are large in numbers.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
teapczynski
Premium Member
Premium Member
Posts: 1
Joined: Thu Jun 07, 2007 12:40 pm

Post by teapczynski »

sud wrote:
kumar_s wrote:Or do a lookup to the table2 for a dummy column with a value say '1'. You can check the lookup condition as give by you table1.a>table2.b and table1.a<=table2.c.
In this case if there is a possibility of more than one record satisfying the condition, please do not forget to set the link name from which multiple records can come (u can set this property by double clicking the lookup condition)
How exactly would you specify the lookup condition (I am new to DataStage). I need to generate a undetermined number of output rows per input. This thread seems to get me the closest but I am having issues reducing the output to the rows I am looking for.
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

Well, you can check out the lookup stage and use dummy columns in both primary and secondary lookup inputs to achieve a cross join and set lookup condition in the lookup stage.

But, what do you mean by "generate a undetermined number of output rows per input" :?:
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
Post Reply