Posted: Wed Jul 03, 2013 12:08 am
xinhuang:
1) I've personally not heard of this issue from support, but I'll defer to them and engineering. It sounds as if it's something specific to your situation, as I have not encountered it before.
2-4) The requirement for key partitioning DOES NOT mean that the developer has to explicitly specify a key partitioner instead of Auto partitioning in their job design. In case you missed it at the very top of my previous post, I explained what the Auto Partition option does in simple terms.
2) Join and RemDup require that data has been key partitioned in order to work as designed (that is, to produce the results they are designed to produce). If data has not been properly key partitioned, you can miss matches in Join or duplicates in RemDup because rows that should match or are duplicates can end up in different partitions.
Aggregator is a little different, but key partitioning is the best choice 99% of the time whether you choose it or Auto partitioning chooses it for you. For the other 1%--only if you are an advanced DataStage developer and have a clear understanding of how Aggregator is working would you need to choose something else.
3) No one ever said EXPLICIT key partitioning was mandatory...if they did, they are mistaken. I think this is where you are having an issue: You seem to believe that someone has said the developers must EXPLICITLY choose a key partitioning option instead of Auto. That is not the case.
Regarding Chandra's jobs: I did mention in my reply to Chandra that Auto partition was likely choosing Hash partitioning (which is a key partitioner) for his jobs. Did you miss this? Because Auto is choosing a key partitioner for him, his jobs are working without problems. Also, it was not stated that choosing Auto partitioning would automatically cause problems, just that it MAY not always result in the most efficiently-running jobs. In Chandra's case, as in most situations, Auto partitioning works great and jobs are apparently meeting their performance expectations.
4) No, it's not wrong. Why would it be?
I don't know your actual experience level with DataStage. My impression is that you don't have a clear enough understanding of how partition parallelism works in DataStage for all of this discussion to make sense to you. It can be a difficult topic to understand, especially when trying to piece it together from forum posts.
This topic in the Information Server documentation, Parallel Processing in Information Server, may be of help to you. I also recommend that you read this RedBook, especially Chapter 6. The Parallel Job Developer's Guide, available at this link also describes partition parallelism, which is at the heart of all of this discussion.
Also, consider enrolling in the Advanced DataStage training class if you've not already taken it. Perhaps your employer will pay for you to take it.
Regards,
1) I've personally not heard of this issue from support, but I'll defer to them and engineering. It sounds as if it's something specific to your situation, as I have not encountered it before.
2-4) The requirement for key partitioning DOES NOT mean that the developer has to explicitly specify a key partitioner instead of Auto partitioning in their job design. In case you missed it at the very top of my previous post, I explained what the Auto Partition option does in simple terms.
2) Join and RemDup require that data has been key partitioned in order to work as designed (that is, to produce the results they are designed to produce). If data has not been properly key partitioned, you can miss matches in Join or duplicates in RemDup because rows that should match or are duplicates can end up in different partitions.
Aggregator is a little different, but key partitioning is the best choice 99% of the time whether you choose it or Auto partitioning chooses it for you. For the other 1%--only if you are an advanced DataStage developer and have a clear understanding of how Aggregator is working would you need to choose something else.
3) No one ever said EXPLICIT key partitioning was mandatory...if they did, they are mistaken. I think this is where you are having an issue: You seem to believe that someone has said the developers must EXPLICITLY choose a key partitioning option instead of Auto. That is not the case.
Regarding Chandra's jobs: I did mention in my reply to Chandra that Auto partition was likely choosing Hash partitioning (which is a key partitioner) for his jobs. Did you miss this? Because Auto is choosing a key partitioner for him, his jobs are working without problems. Also, it was not stated that choosing Auto partitioning would automatically cause problems, just that it MAY not always result in the most efficiently-running jobs. In Chandra's case, as in most situations, Auto partitioning works great and jobs are apparently meeting their performance expectations.
4) No, it's not wrong. Why would it be?
I don't know your actual experience level with DataStage. My impression is that you don't have a clear enough understanding of how partition parallelism works in DataStage for all of this discussion to make sense to you. It can be a difficult topic to understand, especially when trying to piece it together from forum posts.
This topic in the Information Server documentation, Parallel Processing in Information Server, may be of help to you. I also recommend that you read this RedBook, especially Chapter 6. The Parallel Job Developer's Guide, available at this link also describes partition parallelism, which is at the heart of all of this discussion.
Also, consider enrolling in the Advanced DataStage training class if you've not already taken it. Perhaps your employer will pay for you to take it.
Regards,