CoSort Versus PX

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
joesat
Participant
Posts: 93
Joined: Wed Jun 20, 2007 2:12 am

CoSort Versus PX

Post by joesat »

I have been asked to convert CoSort programs to PX as it was felt that PX sort is faster than CoSort.

But I would like to know if there are any benchmarks to prove that PX is indeed faster than CoSort. I do not have the facility to test huge chunks of data (like over 150 GB).

If any one has some solid statistics performed by some universities or like, please let me know the information. Thanks.
Joel Satire
joesat
Participant
Posts: 93
Joined: Wed Jun 20, 2007 2:12 am

Post by joesat »

:)
Joel Satire
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"it was felt"?!!

Shades of the old "statistics have shown"! By whom was it felt? Based upon what? Have you designed/conducted any tests of your own?

I have no information, but would be surprised if DataStage sort could beat CoSort. It's known that DataStage tsort operator is faster than a UNIX sort, and it may be this that is misleading whoever it is feels that way about CoSort. But if that were true, how come CoSort remains in business?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
joesat
Participant
Posts: 93
Joined: Wed Jun 20, 2007 2:12 am

Post by joesat »

ray.wurlod wrote:"it was felt"?!!

Shades of the old "statistics have shown"! By whom was it felt? Based upon what? Have you designed/conducted any tests of your own?

I have no information, but would be surpri ...
I have already mentioned that I do not have the facility to conduct tests on large amounts of data. "It was felt" by my client and they do not really have the statistics to compare PX with CoSort.

But I need to go back to them with some statistics of CoSort and PX run on different systems and their comparative statistics.

I would only like to know if anyone has come across such a performance test. I know that CoSort regular publishes some kind of benchmark reports but I am not able to find anything on Google regarding this.

Since many of the people in this forum have used CoSort on the Server edition of DS, it would be of immense help to me if such a performance report was available.

P.S: I dont remember saying "statistics have shown".
Joel Satire
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If your client supposes that PX sort is faster than CoSort, then using CoSort's published benchmark figures would most likely just add fuel to the fire.
I would try to compare the sort and run times for as much data as your current configuration can handle; that way you can choose the complexity and type of sorts to match the customer's expectations and you will also be running on your customer's machine - duplicating memory, CPU and disk performance.
Even though the runtimes might not scale linearly, you will have done as much as possible.
The nature of sorting algorithms is that benefits of superior algorithms and methodologies only show up with high volumes (in fact some low volume sorts with tools like CoSort will take longer than even a trivial bubble-sort algorithm). But if you can get runtimes of over 10 minutes or so you should have a fairly reliable benchmark.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not you, Joe, but how many commentators/reporters use "statistics have shown" or "studies have shown" without backing it up? That's why standards are different for academic publishing. It really exasperates me (maybe because one of my majors is in the mathematical bases of statistical theory) when people use statistics sloppily or, worse, make unsubstantiated claims prefaced with "statistics show". Statistics can show anything you please. Read Darrell Huff's book How To Lie With Statistics to get excellent examples of what I'm complaining about.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
joesat
Participant
Posts: 93
Joined: Wed Jun 20, 2007 2:12 am

Post by joesat »

Thanks Ray,

..also the benchmark reports are usually skewed, for instance NSort once showed that they were faster than SyncSort on one system, while SyncSort showed that they were faster than NSort on another system!

Coming to my query.. does anyone have any published reports? Thanks!
Last edited by joesat on Thu Jan 10, 2008 5:39 am, edited 1 time in total.
Joel Satire
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Haven't done benchmarks but I wouldn't be surprised if CoSort sorts of sequential data was faster than PX - simply because PX has to convert the data into the internal metadata format and partition it before it can start sorting. But sort speed alone is not the entire ballgame, it's usually a prequel to transforming and/or loading the data and that's where PX thrives. PX alone may be faster than CoSort followed by PX. If you've got Server Edition the CoSort plugin in a good option as it can be hundreds of times faster than a Server job sort/filter/join.
Post Reply