Page 1 of 1

Posted: Fri Aug 25, 2006 6:37 am
by ArndW
The answers you get depend upon whether you are working in Server or in Parallel jobs. Even better answer will come when you explain what type of source data you have (sequential files don't have keys, and sometimes DB level filtering is more efficient than doing it in DataStage)

Posted: Fri Aug 25, 2006 7:28 am
by DSguru2B
If you have an oracle source or any DB table then you can filter within the select statement.

To find if there are any duplicate values in TABLE_B alone (if there is a
primary key then there will be no duplicates) type the following:

Code: Select all

SELECT COUNT(*), COLUMN_A, COLUMN_B
FROM TABLE_B
GROUP BY COLUMN_A, COLUMN_B
HAVING COUNT(*)>1;

Posted: Fri Aug 25, 2006 7:50 am
by datastage1
DSguru2B wrote:If you have an oracle source or any DB table then you can filter within the select statement.

To find if there are any duplicate values in TABLE_B alone (if there is a
primary key then there will be no duplicates) type the following:

Code: Select all

SELECT COUNT(*), COLUMN_A, COLUMN_B
FROM TABLE_B
GROUP BY COLUMN_A, COLUMN_B
HAVING COUNT(*)>1;
Thanks for clearing my doubt .what if the case was sequential file. Is there any way to give points for the correct n helpful answers in these forums as i am new here.

Posted: Fri Aug 25, 2006 7:54 am
by DSguru2B
There are many ways done in Server to resolve the Duplicacy issue. If you give a search you should be able to find routines written by the posters. Click HERE

In parallel there is a remove duplicate stage.

Good luck

Posted: Fri Aug 25, 2006 7:54 am
by chulett
datastage1 wrote:Is there any way to give points for the correct n helpful answers in these forums as i am new here.
That's coming... currently being tested by a small group of people.