Page 1 of 1

Remove Duplicates

Posted: Thu Feb 05, 2009 2:47 am
by arshi
Hi,

I want to remove the duplicate data on the DATE column data.
The date format is DDMMYYYY.

Using Datastage Server Edition (7.5) , source is Flat file.

Any one have the solution for this?




Regards,
Arshi

Posted: Thu Feb 05, 2009 4:14 am
by Sainath.Srinivasan
What have you tried?

Posted: Thu Feb 05, 2009 5:13 am
by arshi
Hi,

First sort the data by using the Sort stage and use the stage variables to remove the duplicates.

By using the sort stage , It will work fine for a particular month data but not on entire data.

Any one have the solution for this.

Regards,
Arshi.

Posted: Thu Feb 05, 2009 5:28 am
by Sainath.Srinivasan
What about any other columns in the data?

How do you define duplicate in you data?

Posted: Thu Feb 05, 2009 6:06 am
by dr.murthy
arshi wrote:Hi,

First sort the data by using the Sort stage and use the stage variables to remove the duplicates.

By using the sort stage , It will work fine for a particular month data but not on entire data.

Any one have the solution for this.

Regards,
Arshi.
hi ,

do one thing in the sort stage options tab just specify the options
ALLOW DUPLICATES IS TRUE ,no need to use aditionally remove duplicates stage.make sure that select the execution mode is sequential.it shoud works fine.

Posted: Thu Feb 05, 2009 6:24 am
by arshi
Hi Murthy,

I didnot found any options tab in the sort stage . I am using the server edition (7.5) . Can you explain where it is exactly?


Hi Sainath,

As per my requirement I have to sort column1 and column2. Here, column2 having the date data (DDMMYYYY).

If i use the sort stage its not giving the correct result.I think you understand my requirement.

Regards,
Arshi

Posted: Thu Feb 05, 2009 6:28 am
by Sainath.Srinivasan
Why don't you write into an hash file and read from it.

Posted: Thu Feb 05, 2009 6:34 am
by dr.murthy
[quote="arshi"]Hi Murthy,

I didnot found any options tab in the sort stage . I am using the server edition (7.5) . Can you explain where it is exactly?


which version of DS you areusing ?.it is PX or server?

Posted: Thu Feb 05, 2009 6:58 am
by arshi
Murthy,

Using Server Edition (7.5.1)

Posted: Thu Feb 05, 2009 8:33 am
by chulett
No such option in the Server sort stage, checking and removing duplicates would need to happen downstream from the Sort.

Posted: Thu Feb 05, 2009 9:35 am
by JRodriguez
Hi Archi,

Before sorting the data add an extra column where you want to have the date formatted as YYYYMMDD and use it as your sort variable, use the stage variables to remove the duplicates.


It will work fine for on entire data.

Posted: Thu Feb 05, 2009 12:14 pm
by ray.wurlod
If you're sorting using the sort command as a filter in the Sequential File stage, why not just add the "unique" (-u) option to that command?