Hi folks, this might be along the lines of thinking out loud, but it seems to me that I will be using hash partitioning (in general) within DataStage jobs to support (in general) joins. At the same time, it is nice to use ORDER BY in the SQL select statements to reduce the memory burden in the DataStage server. The problem is that ordered data coming into the DataStage system gets scrambled when repartitioned (eg. by hash) within DataStage, requiring another sort anyway. Two possible answers, 1) impose hash partitioning in the Oracle connector, but this does not seem to be an option for the general case in which a hash is required on an arbitrary column. Or? 2) Perhaps there is some way to repartition in DS without scrambling the sort (I doubt this, but....) If this is possible, I would like to know.
Thanks,
Doug
Hash partitioning in Oracle Connector
Moderators: chulett, rschirm, roy
yep, seems to be the limiter. Looks like my "best practice" is to pull the data in parallel without sorting in the db, then hash partition and sort on the hash key in DataStage. My aim is to land all extracted data sorted and hashed by the primary key, ready for downstream consumption. I hate to do this though because my extract jobs are now much more heavyweight.kwwilliams wrote:the table partition algorithm and the DataStage partition algorithm are not the same.
Thanks,
Doug