Pre partitioned - sorted Dataset read Very slow

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rohanf
Participant
Posts: 30
Joined: Fri Oct 07, 2005 5:39 am
Location: pune india

Pre partitioned - sorted Dataset read Very slow

Post by rohanf »

Hi,

I have Partitioned and sorted data in a dataset with approximately 26 Million records.
When using the same dataset as a source in another job and not changing the Partition or sort order, i.e. using same partinioning as below:-

Dataset(Set) ------>Join(Same Partitioning)

For a very long time the Number or records read from this source dataset is "0". I was expecting that using a pre-partitioned and sorted dataset as source will Increase performance. But there seem to be no difference as the dataset was read after a very very long time by the datastage.

Is there some setting required for faster reads.

Thanks in Advance
Rohan
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

What other stages exist after the join stage? I'm certain this isn't your whole job flow and the issue could likely lie elsewhere...

How long is a "very very long time" for it to read the rows into the join stage? Bear in mind it is loading these up into memory, so the issue could also be contention there or contention in the filesystem the dataset resides in.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
rohanf
Participant
Posts: 30
Joined: Fri Oct 07, 2005 5:39 am
Location: pune india

Post by rohanf »

The stages after join in order is as below:

1: Filter (For capturing error failures)
2: Lookup
3: Transfomer
4: Surrogate key stage
5: Target dataset


Tell me if more info is needed.

Thanks & Regards
Rohan[/img]
mahadev.v
Participant
Posts: 111
Joined: Tue May 06, 2008 5:29 am
Location: Bangalore

Post by mahadev.v »

DataStage puts in a Sort stage before the join. And probably the DataSet and Sort are combined onto one operator. Look at the score of the job to determine if this is the case. I suggest you put in a sort stage and add the property Sort Key Mode to "Don't sort Previously Sorted". This might help.
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

The same end can be achieved on the input tab of your join, where you partition same. Check to sort and check stable to ensure it doesn't re-sort your dataset and negate the need to adding a new stage.
Last edited by miwinter on Mon Jun 01, 2009 4:40 am, edited 1 time in total.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
rohanf
Participant
Posts: 30
Joined: Fri Oct 07, 2005 5:39 am
Location: pune india

Post by rohanf »

I assume the inline stable sort option still utilize the scratch disk space. I have restrictions on the scratch space size as well. This is the reason I had sorted at Oracle side in my extraction step to generate this Partitioned and sorted dataset.

However I assume the "Don't sort previously sorted" option does not use the Scratch disk space.

Please correct me If I am wrong.

Regards
Rohan
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

So having ordered by in Oracle in your previous job, you then partitioned inline on the join keys before the dataset write?
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
rohanf
Participant
Posts: 30
Joined: Fri Oct 07, 2005 5:39 am
Location: pune india

Post by rohanf »

Correct!

In the previous extraction job below are the steps performed:-

1: Sorting at oracle side using order by( on say K1)
2: Partitioning on datastage side on the same key K1
3: Write to dataset.


Now in the next job my join key is "K1".

Regards
Rohan
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

OK, so you go with the sort stage using don't sort, previously sorted on your sort/join key.

You could also add APT_NO_SORT_INSERTION to your job, but this will disable at job-level, so you'd need to take care of any other sorting requirements manually for the other stages in your second job.

Let us know which way you go and what effect it produces.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
Post Reply