Duplicate Records

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Check this post for a downloadable Tech Tip that will explain how to do this. It includes pictures of the job and stage variables design, which can really help bring the methodology across. Picture. 1000 words. You know. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
sb_akarmarkar
Participant
Posts: 232
Joined: Fri Sep 30, 2005 4:52 am
Contact:

Post by sb_akarmarkar »

Hi,
Use aggregator stage add one more column count (Duplicate records) , in transformer stage if count >1 then transfer record to error table as well as target table....


I think this will help you..


Thnaks,
Anupam
praburaj
Premium Member
Premium Member
Posts: 133
Joined: Thu Jun 30, 2005 1:26 am
Location: philippines

Post by praburaj »

Ur Code is working fine. it has come some data. But that is not duplicate

records. Once again I explain my condition once again.


Column1 ---------------------------- Column 2

Input Data: Dont considered -----.

Suresh ---- Intel

Raj ---- Oracle

Suresh ---- Intel

Kamal ---- Veltech

Raj ----- Oracle

Acct ----- 35

Moort ----- job

Acct ----- 35

Output:

Suresh ----- Intel

Raj ----- Oracle.

Acct ----- 35

That means duplicate records send only in to Error Table. Others

move to Good Records Table. Can u give me some other logic :? .
praburaj
Premium Member
Premium Member
Posts: 133
Joined: Thu Jun 30, 2005 1:26 am
Location: philippines

Post by praburaj »

Ur code is working fine. But it is did not take any duplicate Records. It

has come some rows. Once again i explain my condition.

Input Data : ( Don't Considered ------)

COLUMN1 ------------------- COLUMN 2

RAJ ------------------- INTEL

KAMAL ------------------- ORACLE

PRABU -------------------- VELTECH

RAJ -------------------- INTEL

KARTI -------------------- CTS

PRABU ------------------- VELTECH


OUTPUT DATA IN ERROR TABLE:

RAJ -------------- INTEL

PRABU -------------- VELTECH


This is the Condition. Duplicate records send in to Error Records Table

Only. Other will send to Good Records. Plz help me Some other Logic :?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You've just said "the code doesn't work" without telling anyone what happened. You also stated that you've put in a sort but the data you posted isn't sorted so none of the example or suggestions can work.
Try doing the sort first to a sequential file to ensure that this is working. Then use the transform stage variables. If it doesn't work, add the stage variables as extra columns to the output and make sure that their values are what you expect. You need to take a simple step-by-step approach to this problem.
praburaj
Premium Member
Premium Member
Posts: 133
Joined: Thu Jun 30, 2005 1:26 am
Location: philippines

Post by praburaj »

I sorted the data , using sort stage. Some data has come in to the Error

Target Stage. But it has not come all duplicate records. That data has

come like that.

ACCT CODE --------------- DEPT CODE

2345 ----------------- 450000

3456 ---------------- 450000

2345 --------------- 436789


I need output like that:

ACCT CODE ----------------- DEPT CODE

2345 -------------------- 450000

2345 ------------------- 450000

3456 ------------------- 450000

3567 ------------------- 566789

3567 ------------------ 566789.


That means If one record contain one or more duplicate records, that

all duplicate records send in to Error Target table. If one record contain

only one duplicate record, that also go to the Target table. I tried what

balajsir told. I got only a partial output. I need appropriate output.

Plz help me. How can i write condition. :?

Regards,

prabu
dsdesigner
Participant
Posts: 34
Joined: Thu Jul 29, 2004 1:03 pm

Post by dsdesigner »

praburaj
Premium Member
Premium Member
Posts: 133
Joined: Thu Jun 30, 2005 1:26 am
Location: philippines

Post by praburaj »

Your link did not show anything. It has show only usernamd and

password. can u give me the correct link.
Last edited by praburaj on Sat Jul 29, 2006 12:45 am, edited 1 time in total.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Perhaps dsdesigner need to work with his sid stuff :wink:
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I was wrong about the 'sid' stuff. As Kim has noted several times, it's because of a lack of the 'www' in front of the domain name in the link. Copy the shortcut and paste it in your address bar, add the www and then go.

Or click here. It's just one of the plethora of posts on the subject of Duplicates that seem to have shown up lately. And I'm sure you've already searched the forum and seen all of them already... right?
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

To capture duplicates you need to work with stage variables and constraints. Sort the incoming data, detect the change via use of two stage variables and flag the record thats identical. Then in the constraint of the link where you plan to capture duplicates, use the flag value.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And whaddaya know, a link to a Tech Tip at the top of the page that explains exactly how to do that! :shock:
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsdesigner
Participant
Posts: 34
Joined: Thu Jul 29, 2004 1:03 pm

Post by dsdesigner »

For example you have

a,b
a,b
x,y
y,x
c,d
c,d

select col, col2, count(1)
from table
group by col1, col2
having count(1) > 1.
Load these into a hash file by specifying col1 and col2 as keys.


thsi will yield
a, b, 2
c, d, 2

This you will load in a hash file specifying col1, col2 as he keys

In step 2
you will read te entire source. Look up against the hash file. So when you get the first a,b it iwll find a match in the hash file. Load it to seq2. When the second a, b comes form the source, it will agian find a match in the hash file. It will also go to seq2. When x, y comes next it will not find a match in the hash file. Load it to seq1. Same as a,b for c,d rows.

Code: Select all

              
                      a, b, 2 
                      c, d, 2 
                          
a,b                      | 
a,b                      | 
x,y                      | 
y,x--------------- --- xfm ------------------seq1 
c,d                         \ 
c,d                           \ 
                                 \ 
                                seq2 

 
Thanks,
Shekar
dsdesigner
Participant
Posts: 34
Joined: Thu Jul 29, 2004 1:03 pm

Post by dsdesigner »

Alternatively you can do it in a single job using the sql below. SQL is written for oracle. Should also work for db2. Not sure of other databses.

Code: Select all

src-------xfm------seq1
              |
              |
             seq2
with duplicate_rows as
(select cola, colb, count(1) as cnt
from srcTable
group by cola, colb
having count(1)>1)
select src.cola, src.colb, coalesce(lkp.cnt,1) as num_rows
from srcTable src,
duplicate_rows lkp
where src.cola = lkp.cola(+)
and src.colb = lkp.colb(+)

For the data you have shown above this sql will give you the following o/p
colA, colB, num_rows
2345, 450000, 2
3456, 450000, 1
3567, 566789, 2

Pass this through a transformer, add a constriant to allow rows where num_rows=1. Put a reject link and capture all the other rows.

Thanks,
Shekar
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The OP should get back to us with more updates on where exactly he/she is stuck. He has more than few different ways of going about doing it.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply