Diff results in each run

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Diff results in each run

Post by gagan8877 »

I wrote a job in which dataset is read is matched to a reference output which is coming from Oracle table. The match occurs via 10 parallel lookups on 10 different columns. The reference table always produced exactly the same no. of rows. The dataset also produces exactly the same no. of input rows. But the Lookup stages produce different no. of rows each time (job run). The parameter values are exactly the same in each run (no change in letter case either for strings). I changed all lookups to execute sequentially - but the problem remains. Any ideas?

More Info about the job:
https://picasaweb.google.com/wizlogic/S ... _DiffRslts

(moderator: removed impressively long but singularly unhelpful generated OSH)
Gary
"A journey of a thousand miles, begins with one step"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In spite of its impressive length the generated OSH is singularly unhelpful in answering this question. More useful would be the score. In the meantime you could advise us how the data are partitioned, particularly on the inputs to the Lookup stage, and whether anything (such as the contents - not the same as the number of rows - in the Oracle table)changes between runs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

It sounds like your partitioning may not be correct. Lookup reference links on SMP should be set to entire or auto. What do you have them set to?

Why are your screen shots not showing any link markers (little icons) for partitioning or collecting?

The only thing I see in the osh is a lot of -pp flags. Are you clearing the "Preserve partitioning" setting a lot?

Try running the entire job on a single node more than once, rather than tinkering with a few stages. Tell us if you get the same behavior as before.
Choose a job you love, and you will never have to work a day in your life. - Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

qt_ky wrote:It sounds like your partitioning may not be correct.
Agreed. Anytime I hear "different results" running the same data through multiple times I suspect faulty partitioning. And I too would be curious what happens when the job runs on a single node.
-craig

"You can never have too many knives" -- Logan Nine Fingers
josejohny
Premium Member
Premium Member
Posts: 10
Joined: Wed Nov 26, 2008 11:14 pm
Location: Bangalore

Post by josejohny »

In Screen shot , RMVDUP_UP_HIER stage (Remove duplicate stage) has different out put. So check the partition in remove duplicate stage.
Thanks & Regards
Jose Johny
Project Engineer
Wipro Technologies |Bangalore
"Life is a process of cultivating goodness & removing evilness"
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Found the culprit

Post by gagan8877 »

Hi All

Thanks for all you replies. I tried with single node config file but the problem didn't resolve. I noticed that the problem only occurs if the job is run in the sequence, if I run it individually it produces the same results each time. This led me to focus on the job running upstream. Which had a logic flaw because of a remove duplicates stage that was producing diff dataset with each run. This dataset was the source for the next job. So marking it resolved.

Gary
Gary
"A journey of a thousand miles, begins with one step"
Post Reply