Removing Duplicate records from a flat file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sairamkrish
Participant
Posts: 100
Joined: Wed Feb 02, 2005 4:02 am

Removing Duplicate records from a flat file

Post by sairamkrish »

Hi,

I have a scenario of loading records from a Sequential file into DB2-UDB Stage in server edition of DS.

I would like to know removing duplicate records completely along with the original one from a flat file while loading the DB2 UDB Table without using any temporary staging tables.

Also I would like to insert only rows that are not present in the target table of DB2-UDB Stage.I could not find any update action like "Insert New rows only" in the DB2-UDB Stage.

Appreciate your help.

Thanks
Krish.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Use 'insert new or update existing' and give the keys which you want to check.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sairamkrish,

you have two basic ways of doing this - either within the DataStage job or by using the DB2 database to filter out duplicates.

If you have a unique key and do an INSERT you will get warnings (which you can catch by using a reject link) when trying to update existing records, so this might work for you.

If you use DataStage you have several options; my favorite if the file is sorted (or you can do a sort) then a simple transform stage will let you remove duplicates.
andru
Participant
Posts: 21
Joined: Tue Mar 02, 2004 12:25 am
Location: Chennai

Post by andru »

Use "Update else insert new rows" option to insert new rows and update exiting rows.
If you want to delete the records which are already present in the table, then while updating mark a flag to those records. In the after job SQL you can delete those records with that flag.
sairamkrish
Participant
Posts: 100
Joined: Wed Feb 02, 2005 4:02 am

Post by sairamkrish »

Hi all,

I dont need to update existing rows in the target Db2 table,I need to check the particular key column in the target table against the column in the source data file.

If it exists then dont update anything in the target table.
If it does not exist then insert the row into the target table.

Hope i have made things clear.

thanks
Krish
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Simplest method, sort the input and use stage variable and constraint to allow only new records.

If you want previous load also, you may either use a lookup to confirm this.

Otherwise to have less design change, use a subselect to same table in the load SQL to restrict updates.
Post Reply