Detecting and rejecting duplicate records

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
hiltsmi
Participant
Posts: 20
Joined: Thu Aug 04, 2005 9:03 am

Detecting and rejecting duplicate records

Post by hiltsmi »

I have a file with account records. I want to detect duplicate account numbers and send the duplicate records to a reject file.

I sort the file by account number and then have an input link into a transformer.

I have 2 output links from the transformer. Each output link has a constraint defined on it.

On the first link the constraint is:
NOT(RowProcCompareWithPreviousValue(FromSrtByOldMcNumber.CCD_OLD_ACCT))


On the second link the constraint is:
RowProcCompareWithPreviousValue(FromSrtByOldMcNumber.CCD_OLD_ACCT)

I have 9,075 records going into the transformer and each output link has 9,075 records coming out. In other words I have all records coming out of each output port. But the constraints are identical except for the "NOT".

Can someone tell me what is happening?
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Use stage variables
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,
Try assigning to a stage variable, say "rowstatus".
and use that stage variable in the constraints as rowstatus and NOT(rowstatus).
Coz i come across in some part of documentation, that that function should be used only one part in a job.
Probably my perception may be right if it works.

regards
kumar
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

srinivasan
sorry again my pc is slow in refreshing.

regards
kumar
DaleK
Premium Member
Premium Member
Posts: 68
Joined: Fri Jun 27, 2003 8:33 am
Location: Orlando

Post by DaleK »

We do the same thing for one of our processes. We did just as kumar suggest.
We have a stage variable called svDupAcctNbr and its derivation is
RowProcCompareWithPreviousValue(sorted.Account_Number)
The first links constrain is Not(svDupAcctNbr)
and the second link doesn't have a constraint, but is marked as a reject link.

it works great.

Your code might work if you clear out the second constraint and just mark the link as a reject link, but I like using stage variables.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

DaleK wrote:We do the same thing for one of our processes. We did just as kumar suggest.
We have a stage variable called svDupAcctNbr and its derivation is
RowProcCompareWithPreviousValue(sorted.Account_Number)
The first links constrain is Not(svDupAcctNbr)
and the second link doesn't have a constraint, but is marked as a reject link.

it works great.

Your code might work if you clear out the second constraint and just mark the link as a reject link, but I like using stage variables.
Hi dale,
If both the link needs to be used for future calculation or transformations, then processing in reject link wont be a good practice right.

regards
kumar
DaleK
Premium Member
Premium Member
Posts: 68
Joined: Fri Jun 27, 2003 8:33 am
Location: Orlando

Post by DaleK »

True and even more important if you evaluate the job status code after a job runs, since a reject row will throw a warning.

In our case we have accounts with more then 2 transactions, so I just run the jobs in a loop until the reject link has 0 rows sent down it.
Post Reply