Remove duplicate in Flat file

babbu9 · Post by **babbu9** » Mon Jan 31, 2005 2:59 pm

Hi

I was wondering whether there is a way to take out duplicate records in a flat file. The logic basically should be to error out all the duplicate records based on a key and pass unique records to oracle.

Logic

Flat file ------>identify duplicates-----> unique records to oracle
|
|
V
hash file with duplicate records

Please inform.

Thanks

ketfos · Post by **ketfos** » Mon Jan 31, 2005 3:20 pm

Hi,

1. You can load records from flat file to a hash file.
In the hash file, define the key columns.
So when data is loaded to hash file, duplicate rows would be avoided.

2. Or you may load from flat file directly to Oracle table in which key columns have been defined.
The duplicate records will not be loaded due to integriy constraint in the target Oracle table.

Ketfos

ray.wurlod · Post by **ray.wurlod** » Mon Jan 31, 2005 4:14 pm

You could pass the data through a QualityStage UNDUP operation.

A hashed file is a good way to remove duplicates based on key value, but is not appropriate otherwise.

If you are contemplating the need to remove duplicate rows, sort the input (use sort as a filter in the Sequential File stage) and use stage variables in a Transformer stage to detect whether you've seen the previous row before.

chulett · Post by **chulett** » Mon Jan 31, 2005 5:29 pm

Or use 'sort' as a filter and leverage the '-u' option to sort out only unique records based on your sort keys.

babbu9 · Post by **babbu9** » Mon Jan 31, 2005 6:46 pm

Ray
Thank you for the reply. I was wondering if there was a way for capturing duplicate records when source and target are flat files.

Thanks
Bob

ray.wurlod wrote:You could pass the data through a QualityStage UNDUP operation.

A hashed file is a good way to remove duplicates based on key value, but is not appropriate otherwise.

If you are contemplating the need to remove duplicate rows, sort the input (use sort as a filter in the Sequential File stage) and use stage variables in a Transformer stage to detect whether you've seen the previous row before.

Hi

I was wondering whether there is a way to take out duplicate records in a flat file. The logic basically should be to error out all the duplicate records based on a key and pass unique records to oracle.

Logic

Flat file ------>identify duplicates-----> unique records to oracle
|
|
V
hash file with duplicate records

Please inform.

Thanks

PilotBaha · Post by **PilotBaha** » Mon Jan 31, 2005 6:49 pm

Babbu,
the QS will work pretty good for your requirement as Ray mentioned. You can even make this thing a QS only case unless you want to stream the data.

babbu9 · Post by **babbu9** » Mon Jan 31, 2005 7:29 pm

Hi
Thanks for the reply. The problem is we donot have quality stage. I have to do this just with Datastage 6. Any suggestions in this regard are welcome.

Babu

PilotBaha wrote:Babbu,
the QS will work pretty good for your requirement as Ray mentioned. You can even make this thing a QS only case unless you want to stream the data.

ray.wurlod · Post by **ray.wurlod** » Mon Jan 31, 2005 7:47 pm

The Transfomer stage with stage variables approach will work irrespective of source and target. The only requirement is that source is sorted.

babbu9 · Post by **babbu9** » Mon Jan 31, 2005 8:54 pm

Gentleman
Thank you for the advice. Appreciate it.

Babu

DSXchange

Remove duplicate in Flat file

Remove duplicate in Flat file

thank you, how to capture duplicate records

no QS

thanks