Partitioning Issue with Dataset
Moderators: chulett, rschirm, roy
Partitioning Issue with Dataset
Hi All,
I created a dataset - jb1010DS1.ds in job-1 with HASH partition on key-1. I am reading the same dataset(jb1010DS1.ds) in job-2 and sourcing it to the join stage with SAME partition.
As join in job-2 also on same key, I don't want to re partition the dataset hence I am using in SAME partition.
But the job-2 is throwing the warning - Operator of type "APT_TSortOperator": Will partition the despite preserve-partitioning flag on dataset on input port 0.
I used this method in 7.x version with out this kind of warning. But in 8.x same scenario is throwing the above warning. If I repartition on HASH in second job, then that warning is not coming.
Question here is - Is there any change in datasets creation in 8.x from 7.x??
I created a dataset - jb1010DS1.ds in job-1 with HASH partition on key-1. I am reading the same dataset(jb1010DS1.ds) in job-2 and sourcing it to the join stage with SAME partition.
As join in job-2 also on same key, I don't want to re partition the dataset hence I am using in SAME partition.
But the job-2 is throwing the warning - Operator of type "APT_TSortOperator": Will partition the despite preserve-partitioning flag on dataset on input port 0.
I used this method in 7.x version with out this kind of warning. But in 8.x same scenario is throwing the above warning. If I repartition on HASH in second job, then that warning is not coming.
Question here is - Is there any change in datasets creation in 8.x from 7.x??
--
Swathi Ch
Swathi Ch
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Ray,
I checked the descriptor file, it is showing as
---------------------------------------------------------
Preserve Partitioning: true
Partitioning Method: APT_HashPartitioner
----------------------------------------------------------
then any idea why my dataset is not preserving the partition in second job while I am reading it?
Do I need to do re partition the data every time when we read data from dataset in 8.x?
I checked the descriptor file, it is showing as
---------------------------------------------------------
Preserve Partitioning: true
Partitioning Method: APT_HashPartitioner
----------------------------------------------------------
then any idea why my dataset is not preserving the partition in second job while I am reading it?
Do I need to do re partition the data every time when we read data from dataset in 8.x?
--
Swathi Ch
Swathi Ch
Mike,
My data is already sorted and partition in job-1 while creating the dataset itself.
In Job-2, I don't want to re-partition or re sort the data. Ideally dataset should keep the sort order and also partition so that job-2 wont insert any sort operator.
And also I applied your suggestion too (Adding additional sort stage and specified "don't sort, data already sorted" and given the SAME partition" and SORT the data with keeping SAME partition). Either ways it is giving the same warning.
Until I re-partition the data, I am getting the same warning.
My data is already sorted and partition in job-1 while creating the dataset itself.
In Job-2, I don't want to re-partition or re sort the data. Ideally dataset should keep the sort order and also partition so that job-2 wont insert any sort operator.
And also I applied your suggestion too (Adding additional sort stage and specified "don't sort, data already sorted" and given the SAME partition" and SORT the data with keeping SAME partition). Either ways it is giving the same warning.
Until I re-partition the data, I am getting the same warning.
--
Swathi Ch
Swathi Ch
Mike,
My data is already sorted and partition in job-1 while creating the dataset itself.
In Job-2, I don't want to re-partition or re sort the data. Ideally dataset should keep the sort order and also partition so that job-2 wont insert any sort operator.
And also I applied your suggestion too (Adding additional sort stage and specified "don't sort, data already sorted" and given the SAME partition" and SORT the data with keeping SAME partition). Either ways it is giving the same warning.
Until I re-partition the data, I am getting the same warning.
My data is already sorted and partition in job-1 while creating the dataset itself.
In Job-2, I don't want to re-partition or re sort the data. Ideally dataset should keep the sort order and also partition so that job-2 wont insert any sort operator.
And also I applied your suggestion too (Adding additional sort stage and specified "don't sort, data already sorted" and given the SAME partition" and SORT the data with keeping SAME partition). Either ways it is giving the same warning.
Until I re-partition the data, I am getting the same warning.
--
Swathi Ch
Swathi Ch
For some reason DataStage thinks it HAS to re-partition to accomplish something in job2. Are you running both jobs on the same APT config file? You might want to consider posting some of the relevant parts of the score in job2 to see if we can spot why it thinks it needs to sort.
Also - Do you have $APT_DISABLE_COMBINATION set to TRUE to insure you know which actual stage is inserting the sort? Sometimes if operators are combined its is something "downstream" that needs the tsort inserted.
Also - Do you have $APT_DISABLE_COMBINATION set to TRUE to insure you know which actual stage is inserting the sort? Sometimes if operators are combined its is something "downstream" that needs the tsort inserted.
Andy,
I checked the score. It is inserting tsort operator on same key that I already sorted and partitioned.
One more interesting factor is, I added one more sort stage in job-1 to sort and partition on key-1 before creating dataset. That dataset I am using in job-2 with same partition. Join in job-2, automatically adding sort operator and generating the below warning.
Can any one (working on 8.1 later) confirm that dataset created in job-1 is using in job-2 with join stages with out re-partition the data if the key column is same?
I checked the score. It is inserting tsort operator on same key that I already sorted and partitioned.
One more interesting factor is, I added one more sort stage in job-1 to sort and partition on key-1 before creating dataset. That dataset I am using in job-2 with same partition. Join in job-2, automatically adding sort operator and generating the below warning.
Can any one (working on 8.1 later) confirm that dataset created in job-1 is using in job-2 with join stages with out re-partition the data if the key column is same?
--
Swathi Ch
Swathi Ch
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: