Join Stage

varshanswamy · Post by **varshanswamy** » Thu Feb 24, 2005 11:45 pm

I have ajoin which performs a cartesian product between
2 files having a common column called DUMMY which is defaulted to 1.
Source File - 2000 records
Reference File - 34,2066 records

I want to know how I can quicken the process, and also if join can do the operation because of the heavy volume of data, or I need to use some other stage.

ray.wurlod · Post by **ray.wurlod** » Fri Feb 25, 2005 12:26 am

You didn't state how you were doing the join - presumably a Join stage.
No matter what method you use, you're going to have to process a large number of rows (684132000 based on the figures you supplied).

There's no reason that DataStage would not be able to process this volume, provided you have the resources to support it.

varshanswamy · Post by **varshanswamy** » Fri Feb 25, 2005 1:23 am

I am used an Inner Join as it is based on the dummy column created for the purpose of cartesian product.

ray.wurlod wrote:You didn't state how you were doing the join - presumably a Join stage.
No matter what method you use, you're going to have to process a large number of rows (684132000 based on the figures you supplied).

There's no reason that DataStage would not be able to process this volume, provided you have the resources to support it.

ray.wurlod · Post by **ray.wurlod** » Fri Feb 25, 2005 2:54 pm

Either you or I misunderstand the term Cartesian product, then. As I understand the term, it's all rows from table B for each row in table A.

T42 · Post by **T42** » Tue Mar 01, 2005 3:19 pm

Ray is correct. In fact, you probably are better off with a Lookup stage. Heavy volume? Hah. Maybe for output, but not for a lookup. 300k of reference file that does not have to be sorted? Do a lookup.

Join stage sorts within the framework (unless you already sort the data beforehand.) That adds time.