Join Stage Varying Results

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nvalia
Premium Member
Premium Member
Posts: 180
Joined: Thu May 26, 2005 6:44 am

Join Stage Varying Results

Post by nvalia »

Hi,

DS 8.7 on Windows

I am doing a left Outer join (Join Stage) on an Integer key field and both inputs are Hash Partitioned and Sorted (inbuilt Join stage sort used) on the same key (No Nulls on either links)

But when the same job runs multiple times I am getting varying number of Duplicate records from this join? (Dups are expected from this join)

Any ideas/suggestions on how to solve this?

Thanks,
NV
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

When running the same job with the same input-data using hash-partitioning and sorting on identical columns of both input-links to a join you can't get different results in subsequent runs.

So when you did not change the jobs design it is very likely that your input-data has changed between the first and a later run of the job.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
nvalia
Premium Member
Premium Member
Posts: 180
Joined: Thu May 26, 2005 6:44 am

Post by nvalia »

The input data has Definately not changed as I am the one controlling it..I can say this with certainity

I also know I should not get different results, but I am and hence checking if there is anything else I can do in the design/flow
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

Did you see any warnings in the director. Did you clear the propagate partition in previous stages?
Thanks,
Prasanna
nvalia
Premium Member
Premium Member
Posts: 180
Joined: Thu May 26, 2005 6:44 am

Post by nvalia »

BI-RMA you are correct.

The source data was varying since for testing there was a top n clause used to restrict the data..

So Datatstage does give predictable results when partitioned and sorted correctly, as expected.
Post Reply