Frustrated by inconsistent approach in parallel stages

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
djbarham
Participant
Posts: 34
Joined: Wed May 07, 2003 4:39 pm
Location: Brisbane, Australia

Frustrated by inconsistent approach in parallel stages

Post by djbarham »

... or maybe just my lack of understanding.

My frustration centres around the warning
IIS-DSEE-TFIP-00063 "When checking operator: Dropping component "..." because of a prior component with same name."

I first ran into grief with this on a DIFF stage. The diff stage requires columns that are to be compared to have the same name. However, if ones dares to pass one of these columns to the output, you get a warning about a dropped component (yes, I understand the logic that both are passed to the output and one supersedes the other).

If I understand correctly, there is no way to pass one of the "diff"ed columns to the output without this warning, so you have one of the following choices:
* ignore the warning
* demote / suppress the warning with the message handler (assuming you know which value is being passed to the output).
* duplicate the input columns with different names and pass these to the output (this is the approach I ended up using, but feels like a fudge to me)

Not sure why the separate input columns could not simply be listed (as with a Join) and have DataStage manage the resulting Orchestrate code.

Anyway, that is history, I dealt with it and moved on.

Next we hit the Lookup stage (normal). No big deal, I tell it which input column matches the reference key, names don't matter, the key columns can have the same or different name. Even with the same name on key columns, I can also have an output column with the same name and pass the key value to it.

Today, I need to use a Sparse Lookup (for all the right reasons, small volume input, large volume reference) to discover that as soon as you change to sparse, all the rules change.

I can no longer nominate the relationship between input column and reference key. It now requires the columns to have the same name. No problem. BUT, and here's the kicker, if I name the output column the same as the input / reference key columns I get the error as above:
IIS-DSEE-TFIP-00106 When checking operator: Dropping component "..." because of a prior component with same name.

(well, same text, different error number)

Having run into this with the DIFF, my first reaction was to duplicate the input field and pass that to the output. No, that does not solve it.

The final solution was to rename the OUTPUT column.

It does not mind me passing the input column with the same name as the reference key to the output, but it does mind if the output column has the same name as the reference key.

THAT is where the frustration comes in. Completely different behaviour between stages in relation to how columns are passed from input to output and the same error message meaning completely different things in different stages.

Is this just poor integration of the Orchestrate engine into the DataStage Designer GUI? Would this make more sense if I was familiar with Orchestrate prior to integration into DataStage?

Is it just me?

(Thanks for listening. My job works, without warnings and I have moved on. But, if someone can make sense of this for me, I'm all ears.)
Post Reply