Hash partitioning in Oracle Connector

dougcl · Post by **dougcl** » Fri Jul 16, 2010 5:51 pm

Hi folks, this might be along the lines of thinking out loud, but it seems to me that I will be using hash partitioning (in general) within DataStage jobs to support (in general) joins. At the same time, it is nice to use ORDER BY in the SQL select statements to reduce the memory burden in the DataStage server. The problem is that ordered data coming into the DataStage system gets scrambled when repartitioned (eg. by hash) within DataStage, requiring another sort anyway. Two possible answers, 1) impose hash partitioning in the Oracle connector, but this does not seem to be an option for the general case in which a hash is required on an arbitrary column. Or? 2) Perhaps there is some way to repartition in DS without scrambling the sort (I doubt this, but....) If this is possible, I would like to know.

Thanks,
Doug

chulett · Post by **chulett** » Fri Jul 16, 2010 9:26 pm

Hmmm... something you could accomplish with an explicit Sort stage set to "Don't sort, already sorted" perhaps?

dougcl · Post by **dougcl** » Mon Jul 19, 2010 3:34 pm

kwwilliams wrote:the table partition algorithm and the DataStage partition algorithm are not the same.

yep, seems to be the limiter. Looks like my "best practice" is to pull the data in parallel without sorting in the db, then hash partition and sort on the hash key in DataStage. My aim is to land all extracted data sorted and hashed by the primary key, ready for downstream consumption. I hate to do this though because my extract jobs are now much more heavyweight.

Thanks,
Doug

dougcl · Post by **dougcl** » Mon Nov 29, 2010 7:12 pm

Hi folks, I am told that the modulus partitioning strategy is the same as hash if the hash key happens to be an integer. So in this special case, perhaps specifying modulus partitioned reads in the Oracle Connector will map directly to a hash partition in DataStage. I have not checked it.