Remove Duplicates

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
arshi
Participant
Posts: 50
Joined: Wed Apr 18, 2007 5:12 am

Remove Duplicates

Post by arshi »

Hi,

I want to remove the duplicate data on the DATE column data.
The date format is DDMMYYYY.

Using Datastage Server Edition (7.5) , source is Flat file.

Any one have the solution for this?




Regards,
Arshi
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

What have you tried?
arshi
Participant
Posts: 50
Joined: Wed Apr 18, 2007 5:12 am

Post by arshi »

Hi,

First sort the data by using the Sort stage and use the stage variables to remove the duplicates.

By using the sort stage , It will work fine for a particular month data but not on entire data.

Any one have the solution for this.

Regards,
Arshi.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

What about any other columns in the data?

How do you define duplicate in you data?
dr.murthy
Participant
Posts: 224
Joined: Sun Dec 07, 2008 8:47 am
Location: delhi

Post by dr.murthy »

arshi wrote:Hi,

First sort the data by using the Sort stage and use the stage variables to remove the duplicates.

By using the sort stage , It will work fine for a particular month data but not on entire data.

Any one have the solution for this.

Regards,
Arshi.
hi ,

do one thing in the sort stage options tab just specify the options
ALLOW DUPLICATES IS TRUE ,no need to use aditionally remove duplicates stage.make sure that select the execution mode is sequential.it shoud works fine.
D.N .MURTHY
arshi
Participant
Posts: 50
Joined: Wed Apr 18, 2007 5:12 am

Post by arshi »

Hi Murthy,

I didnot found any options tab in the sort stage . I am using the server edition (7.5) . Can you explain where it is exactly?


Hi Sainath,

As per my requirement I have to sort column1 and column2. Here, column2 having the date data (DDMMYYYY).

If i use the sort stage its not giving the correct result.I think you understand my requirement.

Regards,
Arshi
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Why don't you write into an hash file and read from it.
dr.murthy
Participant
Posts: 224
Joined: Sun Dec 07, 2008 8:47 am
Location: delhi

Post by dr.murthy »

[quote="arshi"]Hi Murthy,

I didnot found any options tab in the sort stage . I am using the server edition (7.5) . Can you explain where it is exactly?


which version of DS you areusing ?.it is PX or server?
D.N .MURTHY
arshi
Participant
Posts: 50
Joined: Wed Apr 18, 2007 5:12 am

Post by arshi »

Murthy,

Using Server Edition (7.5.1)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No such option in the Server sort stage, checking and removing duplicates would need to happen downstream from the Sort.
-craig

"You can never have too many knives" -- Logan Nine Fingers
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Hi Archi,

Before sorting the data add an extra column where you want to have the date formatted as YYYYMMDD and use it as your sort variable, use the stage variables to remove the duplicates.


It will work fine for on entire data.
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you're sorting using the sort command as a filter in the Sequential File stage, why not just add the "unique" (-u) option to that command?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply