Eliminate duplicates from file and capture in flatfile
Moderators: chulett, rschirm, roy
Eliminate duplicates from file and capture in flatfile
Can anyone assist me in capturing all the duplicate records coming from csv file into a separate file without using aggregator.
Re: Eliminate duplicates from file and capture in flatfile
You should be able to achieve this using stage variables in the transformer
DD
Success is right around the corner
Success is right around the corner
Stage variables will work but you would also need to sort the data.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Use sort stage and filter
I think the best way to capture duplicates is by using "Sort Stage".
Define the columns for which you want to find duplicates on as your sort keys and enable "Create Key Change Column". This will create additional column "keyChange". All rows with the value of '0' (zero) in "keyChange" are duplicates.
P.S.: Make sure you hash partition on key columns.
Define the columns for which you want to find duplicates on as your sort keys and enable "Create Key Change Column". This will create additional column "keyChange". All rows with the value of '0' (zero) in "keyChange" are duplicates.
P.S.: Make sure you hash partition on key columns.
Re: Use sort stage and filter
DataStage Server Edition is being used :D
ssbhas wrote: P.S.: Make sure you hash partition on key columns.
DD
Success is right around the corner
Success is right around the corner
Eliminate duplicates from file and capture in flatfile
duplicates by specific columns ?
Eliminate duplicates from file and capture in flatfile
ArndW wrote:Stage variables will work but you would also need to sort the data. ...
can you please tell what is the approach in transformer using stage variable after data being sorted.
You would need to answer Srini's question in order to get a good answer. Basically, stage variables are used to store values from the previous row and compare them to the current row. You would compare those columns you wish to detect duplicates on and, using constraints, skip rows with duplicates. Again, you would need to sort the data so that duplicates can be detected.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>