Page 1 of 1

Partitioning Issue with Dataset

Posted: Wed Dec 11, 2013 10:14 pm
by SwathiCh
Hi All,

I created a dataset - jb1010DS1.ds in job-1 with HASH partition on key-1. I am reading the same dataset(jb1010DS1.ds) in job-2 and sourcing it to the join stage with SAME partition.

As join in job-2 also on same key, I don't want to re partition the dataset hence I am using in SAME partition.

But the job-2 is throwing the warning - Operator of type "APT_TSortOperator": Will partition the despite preserve-partitioning flag on dataset on input port 0.

I used this method in 7.x version with out this kind of warning. But in 8.x same scenario is throwing the above warning. If I repartition on HASH in second job, then that warning is not coming.

Question here is - Is there any change in datasets creation in 8.x from 7.x??

Posted: Thu Dec 12, 2013 12:13 am
by ray.wurlod
Not in Data Set creation.

But more alerts are generated, that were basically ignored in version 7.1.

Posted: Thu Dec 12, 2013 12:27 pm
by SwathiCh
Ray,

I checked the descriptor file, it is showing as
---------------------------------------------------------
Preserve Partitioning: true
Partitioning Method: APT_HashPartitioner
----------------------------------------------------------

then any idea why my dataset is not preserving the partition in second job while I am reading it?

Do I need to do re partition the data every time when we read data from dataset in 8.x?

Posted: Thu Dec 12, 2013 12:49 pm
by Mike
It's the inserted sort operator that is informing you that it will repartition... nothing to do with your dataset.

Put in a explicit sort stage set to use SAME partititioning.

Mike

Posted: Thu Dec 12, 2013 1:16 pm
by SwathiCh
Mike,

My data is already sorted and partition in job-1 while creating the dataset itself.

In Job-2, I don't want to re-partition or re sort the data. Ideally dataset should keep the sort order and also partition so that job-2 wont insert any sort operator.

And also I applied your suggestion too (Adding additional sort stage and specified "don't sort, data already sorted" and given the SAME partition" and SORT the data with keeping SAME partition). Either ways it is giving the same warning.

Until I re-partition the data, I am getting the same warning.

Posted: Thu Dec 12, 2013 1:17 pm
by SwathiCh
Mike,

My data is already sorted and partition in job-1 while creating the dataset itself.

In Job-2, I don't want to re-partition or re sort the data. Ideally dataset should keep the sort order and also partition so that job-2 wont insert any sort operator.

And also I applied your suggestion too (Adding additional sort stage and specified "don't sort, data already sorted" and given the SAME partition" and SORT the data with keeping SAME partition). Either ways it is giving the same warning.

Until I re-partition the data, I am getting the same warning.

Posted: Thu Dec 12, 2013 1:24 pm
by asorrell
For some reason DataStage thinks it HAS to re-partition to accomplish something in job2. Are you running both jobs on the same APT config file? You might want to consider posting some of the relevant parts of the score in job2 to see if we can spot why it thinks it needs to sort.

Also - Do you have $APT_DISABLE_COMBINATION set to TRUE to insure you know which actual stage is inserting the sort? Sometimes if operators are combined its is something "downstream" that needs the tsort inserted.

Posted: Thu Dec 12, 2013 1:57 pm
by SwathiCh
Andy,

I checked the score. It is inserting tsort operator on same key that I already sorted and partitioned.

One more interesting factor is, I added one more sort stage in job-1 to sort and partition on key-1 before creating dataset. That dataset I am using in job-2 with same partition. Join in job-2, automatically adding sort operator and generating the below warning.

Can any one (working on 8.1 later) confirm that dataset created in job-1 is using in job-2 with join stages with out re-partition the data if the key column is same?

Posted: Fri Dec 13, 2013 6:31 am
by priyadarshikunal
Did you use same partitioning?

Posted: Fri Dec 13, 2013 1:58 pm
by SwathiCh
That is what the problem (Keeping SAME partition on dataset in job-2).

Can we use datsets created in job-1 in job-2 with SAME partition in DS8.1 later versions?

Posted: Fri Dec 13, 2013 3:12 pm
by pavi
I have mimiced your scenorio.But didnt get any warning.I am using V8.5.

Job1:

Row gen---->copy--->Dataset(sort and hash partitioned applied in Dataset for key column)

Job2:


row gen
|
(sort on key)
|
V
Dataset-(same partition)-->Join---->peek

No warning either job1 ot Job2.

Posted: Fri Dec 13, 2013 3:54 pm
by SwathiCh
Thank you Pavi. I appreciate your effort.

That might be problem in my DS environment. I didn't see any other way other than re-partitioning the data in job-2 for now.

Thank you all for your response.

Posted: Mon Dec 16, 2013 6:11 am
by priyadarshikunal
SwathiCh wrote:Can we use datsets created in job-1 in job-2 with SAME partition in DS8.1 later versions?
Give it a try.

Posted: Mon Dec 16, 2013 1:48 pm
by ray.wurlod
Try disabling tsort operator insertion, either using an explicit Sort stage (set to "Don't sort, already sorted") or the environment variable.