Hash partitioning in Oracle Connector

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dougcl
Premium Member
Premium Member
Posts: 137
Joined: Thu Jun 24, 2010 4:28 pm

Hash partitioning in Oracle Connector

Post by dougcl »

Hi folks, this might be along the lines of thinking out loud, but it seems to me that I will be using hash partitioning (in general) within DataStage jobs to support (in general) joins. At the same time, it is nice to use ORDER BY in the SQL select statements to reduce the memory burden in the DataStage server. The problem is that ordered data coming into the DataStage system gets scrambled when repartitioned (eg. by hash) within DataStage, requiring another sort anyway. Two possible answers, 1) impose hash partitioning in the Oracle connector, but this does not seem to be an option for the general case in which a hash is required on an arbitrary column. Or? 2) Perhaps there is some way to repartition in DS without scrambling the sort (I doubt this, but....) If this is possible, I would like to know.

Thanks,
Doug
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... something you could accomplish with an explicit Sort stage set to "Don't sort, already sorted" perhaps?
-craig

"You can never have too many knives" -- Logan Nine Fingers
dougcl
Premium Member
Premium Member
Posts: 137
Joined: Thu Jun 24, 2010 4:28 pm

Post by dougcl »

kwwilliams wrote:the table partition algorithm and the DataStage partition algorithm are not the same.
yep, seems to be the limiter. Looks like my "best practice" is to pull the data in parallel without sorting in the db, then hash partition and sort on the hash key in DataStage. My aim is to land all extracted data sorted and hashed by the primary key, ready for downstream consumption. I hate to do this though because my extract jobs are now much more heavyweight.

Thanks,
Doug
dougcl
Premium Member
Premium Member
Posts: 137
Joined: Thu Jun 24, 2010 4:28 pm

Post by dougcl »

Hi folks, I am told that the modulus partitioning strategy is the same as hash if the hash key happens to be an integer. So in this special case, perhaps specifying modulus partitioned reads in the Oracle Connector will map directly to a hash partition in DataStage. I have not checked it.
Post Reply