Partitioning Issue with Dataset

SwathiCh · Post by **SwathiCh** » Wed Dec 11, 2013 10:14 pm

Hi All,

I created a dataset - jb1010DS1.ds in job-1 with HASH partition on key-1. I am reading the same dataset(jb1010DS1.ds) in job-2 and sourcing it to the join stage with SAME partition.

As join in job-2 also on same key, I don't want to re partition the dataset hence I am using in SAME partition.

But the job-2 is throwing the warning - Operator of type "APT_TSortOperator": Will partition the despite preserve-partitioning flag on dataset on input port 0.

I used this method in 7.x version with out this kind of warning. But in 8.x same scenario is throwing the above warning. If I repartition on HASH in second job, then that warning is not coming.

Question here is - Is there any change in datasets creation in 8.x from 7.x??

ray.wurlod · Post by **ray.wurlod** » Thu Dec 12, 2013 12:13 am

Not in Data Set creation.

But more alerts are generated, that were basically ignored in version 7.1.

SwathiCh · Post by **SwathiCh** » Thu Dec 12, 2013 12:27 pm

Ray,

I checked the descriptor file, it is showing as
---------------------------------------------------------
Preserve Partitioning: true
Partitioning Method: APT_HashPartitioner
----------------------------------------------------------

then any idea why my dataset is not preserving the partition in second job while I am reading it?

Do I need to do re partition the data every time when we read data from dataset in 8.x?

Mike · Post by **Mike** » Thu Dec 12, 2013 12:49 pm

It's the inserted sort operator that is informing you that it will repartition... nothing to do with your dataset.

Put in a explicit sort stage set to use SAME partititioning.

Mike

SwathiCh · Post by **SwathiCh** » Thu Dec 12, 2013 1:16 pm

Mike,

My data is already sorted and partition in job-1 while creating the dataset itself.

In Job-2, I don't want to re-partition or re sort the data. Ideally dataset should keep the sort order and also partition so that job-2 wont insert any sort operator.

And also I applied your suggestion too (Adding additional sort stage and specified "don't sort, data already sorted" and given the SAME partition" and SORT the data with keeping SAME partition). Either ways it is giving the same warning.

Until I re-partition the data, I am getting the same warning.

SwathiCh · Post by **SwathiCh** » Thu Dec 12, 2013 1:17 pm

Mike,

My data is already sorted and partition in job-1 while creating the dataset itself.

In Job-2, I don't want to re-partition or re sort the data. Ideally dataset should keep the sort order and also partition so that job-2 wont insert any sort operator.

And also I applied your suggestion too (Adding additional sort stage and specified "don't sort, data already sorted" and given the SAME partition" and SORT the data with keeping SAME partition). Either ways it is giving the same warning.

Until I re-partition the data, I am getting the same warning.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Thu Dec 12, 2013 1:24 pm

For some reason DataStage thinks it HAS to re-partition to accomplish something in job2. Are you running both jobs on the same APT config file? You might want to consider posting some of the relevant parts of the score in job2 to see if we can spot why it thinks it needs to sort.

Also - Do you have $APT_DISABLE_COMBINATION set to TRUE to insure you know which actual stage is inserting the sort? Sometimes if operators are combined its is something "downstream" that needs the tsort inserted.

SwathiCh · Post by **SwathiCh** » Thu Dec 12, 2013 1:57 pm

Andy,

I checked the score. It is inserting tsort operator on same key that I already sorted and partitioned.

One more interesting factor is, I added one more sort stage in job-1 to sort and partition on key-1 before creating dataset. That dataset I am using in job-2 with same partition. Join in job-2, automatically adding sort operator and generating the below warning.

Can any one (working on 8.1 later) confirm that dataset created in job-1 is using in job-2 with join stages with out re-partition the data if the key column is same?

priyadarshikunal · Post by **priyadarshikunal** » Fri Dec 13, 2013 6:31 am

Did you use same partitioning?

SwathiCh · Post by **SwathiCh** » Fri Dec 13, 2013 1:58 pm

That is what the problem (Keeping SAME partition on dataset in job-2).

Can we use datsets created in job-1 in job-2 with SAME partition in DS8.1 later versions?

pavi · Post by **pavi** » Fri Dec 13, 2013 3:12 pm

I have mimiced your scenorio.But didnt get any warning.I am using V8.5.

Job1:

Row gen---->copy--->Dataset(sort and hash partitioned applied in Dataset for key column)

Job2:

row gen
|
(sort on key)
|
V
Dataset-(same partition)-->Join---->peek

No warning either job1 ot Job2.

SwathiCh · Post by **SwathiCh** » Fri Dec 13, 2013 3:54 pm

Thank you Pavi. I appreciate your effort.

That might be problem in my DS environment. I didn't see any other way other than re-partitioning the data in job-2 for now.

Thank you all for your response.

priyadarshikunal · Post by **priyadarshikunal** » Mon Dec 16, 2013 6:11 am

SwathiCh wrote:Can we use datsets created in job-1 in job-2 with SAME partition in DS8.1 later versions?

Give it a try.

ray.wurlod · Post by **ray.wurlod** » Mon Dec 16, 2013 1:48 pm

Try disabling tsort operator insertion, either using an explicit Sort stage (set to "Don't sort, already sorted") or the environment variable.