How do I fix performance bottleneck at Sort stage?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Abhi700
Participant
Posts: 20
Joined: Thu Nov 25, 2010 3:52 am
Location: Pune

How do I fix performance bottleneck at Sort stage?

Post by Abhi700 »

Hi,

One of my job is giving me serious bottleneck issues and any help in resolving that is highly appreciated.

I am using following job design :

Dataset --> joined to a DB2 table using join stage --> Filter Stage --> sort stage --> remove duplicate stage.

From Join stage to filter stage processing is fine, the job is able to process more than 100K rows per second. However at sort and remove duplicate stage this number drops to 2500 rows per second.

Please let me know if there anything I can do to improve processing speed in this job.

Thanks,
ABHILASH
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Can you post your APT configuration file here please? Just wondering what you have setup for scratch, etc.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The rows/sec figure is meaningless.

Think about operation of a sort. It can't output any rows till all its input rows have arrived. However the clock starts running immediately the job starts. Therefore the actual rows/sec out of the Sort stage - indeed out of any blocking stage - will be substantially under-reported.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Ray,

Is rows/sec figure meaningless only for links out of blocking stage or for all links in a job?

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Depends what you want to use it for. As a general rule I'd say it's meaningless everywhere (only a very few exceptions), since the clock is always running, even during startup, waiting for I/O to return, and so on. Also, row sizes vary, another factor mitigating against rows/sec being a particularly useful metric.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you've been here for any length of time, you'd know Ray considers rows/second to be a particularly useless metric. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

I know. He had told the same when I attended his training a few years back ;)
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Rows/sec is extremely useless if it's treated as if it's the instantaneous value out of that stage, which unfortunately many do. About the only place it may approach being somewhat useful is for the final link in a stream's path, and that's a stretch.

Does the sort use the primary key column as the join? If so, you may be able to take advantage of that (Don't Sort...Previously Sorted) depending on how the data's partitioned for the join.
- james wiles


All generalizations are false, including this one - Mark Twain.
suman27
Participant
Posts: 33
Joined: Wed Jul 15, 2009 6:52 am
Location: London

Post by suman27 »

Hi Abhilash,

You can remove the duplicates in the sort stage itself if you are using same key for remove duplicates..
Increase the sort memory size in sort stage.Check whether it makes any difference in performance.


Regards,
Suman.
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

What is your target?
Abhi700
Participant
Posts: 20
Joined: Thu Nov 25, 2010 3:52 am
Location: Pune

Post by Abhi700 »

You can remove the duplicates in the sort stage itself if you are using same key for remove duplicates..
Increase the sort memory size in sort stage.Check whether it makes any difference in performance.
We are performing on remove duplicates on diferent keys.
My Target is DB2 table.
No of rows per sec is 100000rows/sec.
ABHILASH
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

What is the volume of records in the input dataset? Is the key used for partitioning distributing the records reasonably (doesn't create a bottleneck) without making the flow sequential?

Regards
Senthil
jhmckeever
Premium Member
Premium Member
Posts: 301
Joined: Thu Jul 14, 2005 10:27 am
Location: Melbourne, Australia
Contact:

Post by jhmckeever »

Some questions:
- The stage is parallel?
- Partitioning is resulting in relatively even balance of rows across partitions?
- You're not performing an in-line 'pre-sort' on the input link are you? (I've seen this in many places)
- Have you considered playing with memory usage? ($APT_TSORT_STRESS_BLOCKSIZE)
- Your sort utility is "DataStage"
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Post Reply