Merge stage question

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Merge stage question

Post by abc123 »

I have a job with 2 input files. One Master file and one update file. I want to use the Merge stage like an inner join. The reason I am not using a Join stage is because Join does not allow a reject link which I need.

My questions are:

1)It is mentioned in the Developer's guide, page 20-2, that choosing Auto Partitioning will ensure that partioning and sorting is done. Does that mean that key partitioning and sorting will be done automatically by Datastage, i.e., we don't have to choose a Partition Type on the input
links' Partitioning tab?

2)How do we capture the rejected records? Do we set the "Unmatched Masters Mode" property to Drop? Is there a way to capture reject records in the Join stage?
Havoc
Participant
Posts: 110
Joined: Fri Nov 24, 2006 8:26 am

Post by Havoc »

Merge Stage (in Drop Mode) = Inner Join

Yes, the merge stage has a reject link. The reject link carries rejected update link records.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

(Auto) will apply hash partitioning to the join key fields on both inputs and insert a tsort operator (and probably a parallel buffer operator) on both inputs. All this can be seen in the score.

You have an inner join with Drop, a left outer join with Continue.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

Ray and Havoc, thank you for your responses. Ray, How do I capture the dropped rows? In the corresponding reject link? In my job, it looks like even when the rows are matching, it is populating the reject file only. The matched file remains empty. I'll post the score in a minute.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try changing the unmatched masters mode to Keep.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

I changed the "Unmatched Masters Mode" to Keep. However, this way the all rows go to the matched link and the reject link. I would think that allmost all rows should go to the matched link and only a few should go to the reject link.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you please describe all the properties in your Merge stage? What you are claiming is not consistent with the way the Merge stage is supposed to operate.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

Here are the properties of my Merge state:

Merge Keys:
Key = One
Sort Order = Ascending

Options:
UnMatched Masters Mode = Keep
Warn On Reject Updates = False
Warn On UnMatched Masters = False

I have 2 links with 1 PK with the same name in both links. This column is a char column. Virtually all rows have matching values in these 2 input links except a few. However, all rows go to both links. That doesn't make sense to me. I would think that only unmatched rows should go to the reject link.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

One thing to keep in mind is that key values from the Update input are consumed - this is different behaviour than that of, say, the Lookup stage.

But if that were the "fault" I would have expected you to say something like "nearly all the rows", not an absolute statement such as "all the rows".

Unmatched masters mode should, correctly, direct all rows to the Master output link (port #0). Where there was a mismatch, columns from the Update input should be NULL. I asked you to change this specifically to test this. You can change unmatched masters mode back to Drop now.

You mention that the match keys are Char. Are they the same length, padded with the same characters? Is it worth changing them to VarChar and trimming any trailing pad characters?

I just created a job with a Merge stage, and cannot reproduce what you are seeing. :?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply