Partitioning Issue with Dataset

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Partitioning Issue with Dataset

Post by SwathiCh »

Hi All,

I created a dataset - jb1010DS1.ds in job-1 with HASH partition on key-1. I am reading the same dataset(jb1010DS1.ds) in job-2 and sourcing it to the join stage with SAME partition.

As join in job-2 also on same key, I don't want to re partition the dataset hence I am using in SAME partition.

But the job-2 is throwing the warning - Operator of type "APT_TSortOperator": Will partition the despite preserve-partitioning flag on dataset on input port 0.

I used this method in 7.x version with out this kind of warning. But in 8.x same scenario is throwing the above warning. If I repartition on HASH in second job, then that warning is not coming.

Question here is - Is there any change in datasets creation in 8.x from 7.x??
--
Swathi Ch
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not in Data Set creation.

But more alerts are generated, that were basically ignored in version 7.1.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

Ray,

I checked the descriptor file, it is showing as
---------------------------------------------------------
Preserve Partitioning: true
Partitioning Method: APT_HashPartitioner
----------------------------------------------------------

then any idea why my dataset is not preserving the partition in second job while I am reading it?

Do I need to do re partition the data every time when we read data from dataset in 8.x?
--
Swathi Ch
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

It's the inserted sort operator that is informing you that it will repartition... nothing to do with your dataset.

Put in a explicit sort stage set to use SAME partititioning.

Mike
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

Mike,

My data is already sorted and partition in job-1 while creating the dataset itself.

In Job-2, I don't want to re-partition or re sort the data. Ideally dataset should keep the sort order and also partition so that job-2 wont insert any sort operator.

And also I applied your suggestion too (Adding additional sort stage and specified "don't sort, data already sorted" and given the SAME partition" and SORT the data with keeping SAME partition). Either ways it is giving the same warning.

Until I re-partition the data, I am getting the same warning.
--
Swathi Ch
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

Mike,

My data is already sorted and partition in job-1 while creating the dataset itself.

In Job-2, I don't want to re-partition or re sort the data. Ideally dataset should keep the sort order and also partition so that job-2 wont insert any sort operator.

And also I applied your suggestion too (Adding additional sort stage and specified "don't sort, data already sorted" and given the SAME partition" and SORT the data with keeping SAME partition). Either ways it is giving the same warning.

Until I re-partition the data, I am getting the same warning.
--
Swathi Ch
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

For some reason DataStage thinks it HAS to re-partition to accomplish something in job2. Are you running both jobs on the same APT config file? You might want to consider posting some of the relevant parts of the score in job2 to see if we can spot why it thinks it needs to sort.

Also - Do you have $APT_DISABLE_COMBINATION set to TRUE to insure you know which actual stage is inserting the sort? Sometimes if operators are combined its is something "downstream" that needs the tsort inserted.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

Andy,

I checked the score. It is inserting tsort operator on same key that I already sorted and partitioned.

One more interesting factor is, I added one more sort stage in job-1 to sort and partition on key-1 before creating dataset. That dataset I am using in job-2 with same partition. Join in job-2, automatically adding sort operator and generating the below warning.

Can any one (working on 8.1 later) confirm that dataset created in job-1 is using in job-2 with join stages with out re-partition the data if the key column is same?
--
Swathi Ch
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Did you use same partitioning?
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

That is what the problem (Keeping SAME partition on dataset in job-2).

Can we use datsets created in job-1 in job-2 with SAME partition in DS8.1 later versions?
--
Swathi Ch
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

Post by pavi »

I have mimiced your scenorio.But didnt get any warning.I am using V8.5.

Job1:

Row gen---->copy--->Dataset(sort and hash partitioned applied in Dataset for key column)

Job2:


row gen
|
(sort on key)
|
V
Dataset-(same partition)-->Join---->peek

No warning either job1 ot Job2.
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

Thank you Pavi. I appreciate your effort.

That might be problem in my DS environment. I didn't see any other way other than re-partitioning the data in job-2 for now.

Thank you all for your response.
--
Swathi Ch
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

SwathiCh wrote:Can we use datsets created in job-1 in job-2 with SAME partition in DS8.1 later versions?
Give it a try.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try disabling tsort operator insertion, either using an explicit Sort stage (set to "Don't sort, already sorted") or the environment variable.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply