join datasets
Moderators: chulett, rschirm, roy
join datasets
Hi all,
I have 2 datasets that I want to combine as follows:
Dataset 1:
col1
1
2
Dataset 2:
col2
1
2
3
result set:
col1 col2
1 1
1 2
1 3
2 1
2 2
2 3
I think the join/lookup stage is not suitable because you need to specify which keys in both sets you want to link.
How to do this?
I have 2 datasets that I want to combine as follows:
Dataset 1:
col1
1
2
Dataset 2:
col2
1
2
3
result set:
col1 col2
1 1
1 2
1 3
2 1
2 2
2 3
I think the join/lookup stage is not suitable because you need to specify which keys in both sets you want to link.
How to do this?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You can use a Join stage, specify a full outer join. You may need to rename some columns so that the "key" column names match on the left and right inputs - use upstream Copy or Modify stage(s) for this.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
It is possible in a 'join' stage.
For the first dataset you add one more dummy column with a dummy value say char(1) with value 'A' using 'column generator'. this new dummy column should have the same value for all the records from the first dataset.
Now repeat the same thing for the other dataset. For the second dataset also, the dummy column name and the value should be same as that of the first dataset.
You have now got the common field to perform 'join'. in 'join' do an 'inner' join by the dummy column. This join should give you the cartesian product of the two input datasets. And I suppose, that is what you need also. Try this and let me know.
For the first dataset you add one more dummy column with a dummy value say char(1) with value 'A' using 'column generator'. this new dummy column should have the same value for all the records from the first dataset.
Now repeat the same thing for the other dataset. For the second dataset also, the dummy column name and the value should be same as that of the first dataset.
You have now got the common field to perform 'join'. in 'join' do an 'inner' join by the dummy column. This join should give you the cartesian product of the two input datasets. And I suppose, that is what you need also. Try this and let me know.
hi all,
thanks for your advices. i create a testjob with a dataset containing 110 records and a dataset containing 5 records. I joined both datasets on a dummycolumn (both filled with 1) that i added to each dataset using transformers. The result: 550 records !
However, when I implemented the same solution to my job, the join only generated a cartesian output by taking 2 instead of 5 records of dataset2. So, instead of joining 4000 * 5 records, only 8000 records were actually joined??
The log gives me no clue why the last 3 records of dataset 2 were not taken...
Any explanation for this?
There was no difference between inner join and full outer join
Peter
thanks for your advices. i create a testjob with a dataset containing 110 records and a dataset containing 5 records. I joined both datasets on a dummycolumn (both filled with 1) that i added to each dataset using transformers. The result: 550 records !
However, when I implemented the same solution to my job, the join only generated a cartesian output by taking 2 instead of 5 records of dataset2. So, instead of joining 4000 * 5 records, only 8000 records were actually joined??
The log gives me no clue why the last 3 records of dataset 2 were not taken...
Any explanation for this?
There was no difference between inner join and full outer join
Peter
-
- Participant
- Posts: 222
- Joined: Tue Aug 30, 2005 2:07 am
- Location: pune
- Contact:
Hi Peter,
Rather than creating dummy column and perofrm inner join. You, can do one thing, which is already suggested by ray. Rename the column name that should be same in the both the datastes. Use copy stage to rename the columns. For example, you have a key with the name 'Amount' in one dataset and 'Amount_1' is the column name in the other dataset. rename the column name of second dataset as 'Amount'. But, remember that datatype should be the same in both columns and perform Full outer join and you will get the expected result. For this approach you will get duplicates also, perform Remove duplicates to eliminate those.
Let us know your output.
Rather than creating dummy column and perofrm inner join. You, can do one thing, which is already suggested by ray. Rename the column name that should be same in the both the datastes. Use copy stage to rename the columns. For example, you have a key with the name 'Amount' in one dataset and 'Amount_1' is the column name in the other dataset. rename the column name of second dataset as 'Amount'. But, remember that datatype should be the same in both columns and perform Full outer join and you will get the expected result. For this approach you will get duplicates also, perform Remove duplicates to eliminate those.
Let us know your output.
NageshSunkoji
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: