Partitioning for Different stages

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Partitioning for Different stages

Post by pavankvk »

Hi,

is there a thumb rule for what stages you need to partition/sort data for proper functioning and for what you dont have to.

To my understanding, some stages like join,merge,aggregator,remove dup etc need the data to be partitioned and sorted for them to produce expected results. Just leaving auto partition on these stages is not going to produce correct results. is this true?

also assuming that you have a 4 node config file and all the resource disks in different nodes point to the same directory, will auto partition work for all the stages,including the stages where it is mandatory to partition and sort data? is it because that different nodes point to the same physical location, records are read such that they will be only in one partition??
sajarman
Participant
Posts: 41
Joined: Mon Nov 28, 2005 6:29 am

Post by sajarman »

Here goes my five cents:

I think the logic for partitioning data in a stage (or link to be more precise) is required when you have to match data between links (join/merge/lookup etc) or to compare rows within a stage (aggregator, R-Dup etc). This will help improve accuracy and performace, as you can see.

As far as Auto partitioning is concerned, I have also observed incorrect results during my early experiences. But I think that is an old story now. But anyways, I don't leave DataStage to decide things when I can make the decisions. It gives me better control and awareness on whats going on.
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

sajarman wrote:Here goes my five cents:

I think the logic for partitioning data in a stage (or link to be more precise) is required when you have to match data between links (join/merge/lookup etc) or to compare rows within a stage (aggregator, R-Dup etc). This will help improve accuracy and performace, as you can see.

As far as Auto partitioning is concerned, I have also observed incorrect results during my early experiences. But I think that is an old story now. But anyways, I don't leave DataStage to decide things when I can make the decisions. It gives me better control and awareness on whats going on.
When you say its a old story now, you mean to say it was a BUG which is now fixed?
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

sajarman wrote:Here goes my five cents:

I think the logic for partitioning data in a stage (or link to be more precise) is required when you have to match data between links (join/merge/lookup etc) or to compare rows within a stage (aggregator, R-Dup etc). This will help improve accuracy and performace, as you can see.

As far as Auto partitioning is concerned, I have also observed incorrect results during my early experiences. But I think that is an old story now. But anyways, I don't leave DataStage to decide things when I can make the decisions. It gives me better control and awareness on whats going on.
When you say its a old story now, you mean to say it was a BUG which is now fixed?
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

Post by pavankvk »

sajarman wrote:Here goes my five cents:

I think the logic for partitioning data in a stage (or link to be more precise) is required when you have to match data between links (join/merge/lookup etc) or to compare rows within a stage (aggregator, R-Dup etc). This will help improve accuracy and performace, as you can see.

As far as Auto partitioning is concerned, I have also observed incorrect results during my early experiences. But I think that is an old story now. But anyways, I don't leave DataStage to decide things when I can make the decisions. It gives me better control and awareness on whats going on.
When you say its a old story now, you mean to say it was a BUG which is now fixed?
sajarman
Participant
Posts: 41
Joined: Mon Nov 28, 2005 6:29 am

Post by sajarman »

I do not know if it was a bug or not... I have not experienced that issue some time back when I experimented Auto partitioning. Now I go with Hash partition etc as one of my best practices and not to leave Auto where it matters.
Post Reply