Track duplicate rows in the source data

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The answers you get depend upon whether you are working in Server or in Parallel jobs. Even better answer will come when you explain what type of source data you have (sequential files don't have keys, and sometimes DB level filtering is more efficient than doing it in DataStage)
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

If you have an oracle source or any DB table then you can filter within the select statement.

To find if there are any duplicate values in TABLE_B alone (if there is a
primary key then there will be no duplicates) type the following:

Code: Select all

SELECT COUNT(*), COLUMN_A, COLUMN_B
FROM TABLE_B
GROUP BY COLUMN_A, COLUMN_B
HAVING COUNT(*)>1;
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
datastage1
Participant
Posts: 5
Joined: Mon Aug 14, 2006 2:48 pm

Post by datastage1 »

DSguru2B wrote:If you have an oracle source or any DB table then you can filter within the select statement.

To find if there are any duplicate values in TABLE_B alone (if there is a
primary key then there will be no duplicates) type the following:

Code: Select all

SELECT COUNT(*), COLUMN_A, COLUMN_B
FROM TABLE_B
GROUP BY COLUMN_A, COLUMN_B
HAVING COUNT(*)>1;
Thanks for clearing my doubt .what if the case was sequential file. Is there any way to give points for the correct n helpful answers in these forums as i am new here.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

There are many ways done in Server to resolve the Duplicacy issue. If you give a search you should be able to find routines written by the posters. Click HERE

In parallel there is a remove duplicate stage.

Good luck
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

datastage1 wrote:Is there any way to give points for the correct n helpful answers in these forums as i am new here.
That's coming... currently being tested by a small group of people.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply