I have ajoin which performs a cartesian product between
2 files having a common column called DUMMY which is defaulted to 1.
Source File - 2000 records
Reference File - 34,2066 records
I want to know how I can quicken the process, and also if join can do the operation because of the heavy volume of data, or I need to use some other stage.
Join Stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You didn't state how you were doing the join - presumably a Join stage.
No matter what method you use, you're going to have to process a large number of rows (684132000 based on the figures you supplied).
There's no reason that DataStage would not be able to process this volume, provided you have the resources to support it.
No matter what method you use, you're going to have to process a large number of rows (684132000 based on the figures you supplied).
There's no reason that DataStage would not be able to process this volume, provided you have the resources to support it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 48
- Joined: Thu Mar 11, 2004 10:32 pm
I am used an Inner Join as it is based on the dummy column created for the purpose of cartesian product.
ray.wurlod wrote:You didn't state how you were doing the join - presumably a Join stage.
No matter what method you use, you're going to have to process a large number of rows (684132000 based on the figures you supplied).
There's no reason that DataStage would not be able to process this volume, provided you have the resources to support it.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Ray is correct. In fact, you probably are better off with a Lookup stage. Heavy volume? Hah. Maybe for output, but not for a lookup. 300k of reference file that does not have to be sorted? Do a lookup.
Join stage sorts within the framework (unless you already sort the data beforehand.) That adds time.
Join stage sorts within the framework (unless you already sort the data beforehand.) That adds time.