Page 1 of 1

Remove duplicate in Flat file

Posted: Mon Jan 31, 2005 2:59 pm
by babbu9
Hi

I was wondering whether there is a way to take out duplicate records in a flat file. The logic basically should be to error out all the duplicate records based on a key and pass unique records to oracle.

Logic

Flat file ------>identify duplicates-----> unique records to oracle
|
|
V
hash file with duplicate records


Please inform.

Thanks

Posted: Mon Jan 31, 2005 3:20 pm
by ketfos
Hi,

1. You can load records from flat file to a hash file.
In the hash file, define the key columns.
So when data is loaded to hash file, duplicate rows would be avoided.

2. Or you may load from flat file directly to Oracle table in which key columns have been defined.
The duplicate records will not be loaded due to integriy constraint in the target Oracle table.


Ketfos

Posted: Mon Jan 31, 2005 4:14 pm
by ray.wurlod
You could pass the data through a QualityStage UNDUP operation.

A hashed file is a good way to remove duplicates based on key value, but is not appropriate otherwise.

If you are contemplating the need to remove duplicate rows, sort the input (use sort as a filter in the Sequential File stage) and use stage variables in a Transformer stage to detect whether you've seen the previous row before.

Posted: Mon Jan 31, 2005 5:29 pm
by chulett
Or use 'sort' as a filter and leverage the '-u' option to sort out only unique records based on your sort keys. :wink:

thank you, how to capture duplicate records

Posted: Mon Jan 31, 2005 6:46 pm
by babbu9
Ray
Thank you for the reply. I was wondering if there was a way for capturing duplicate records when source and target are flat files.

Thanks
Bob
ray.wurlod wrote:You could pass the data through a QualityStage UNDUP operation.

A hashed file is a good way to remove duplicates based on key value, but is not appropriate otherwise.

If you are contemplating the need to remove duplicate rows, sort the input (use sort as a filter in the Sequential File stage) and use stage variables in a Transformer stage to detect whether you've seen the previous row before.
Hi

I was wondering whether there is a way to take out duplicate records in a flat file. The logic basically should be to error out all the duplicate records based on a key and pass unique records to oracle.

Logic

Flat file ------>identify duplicates-----> unique records to oracle
|
|
V
hash file with duplicate records


Please inform.

Thanks

Posted: Mon Jan 31, 2005 6:49 pm
by PilotBaha
Babbu,
the QS will work pretty good for your requirement as Ray mentioned. You can even make this thing a QS only case unless you want to stream the data.

no QS

Posted: Mon Jan 31, 2005 7:29 pm
by babbu9
Hi
Thanks for the reply. The problem is we donot have quality stage. I have to do this just with Datastage 6. Any suggestions in this regard are welcome.

Babu

PilotBaha wrote:Babbu,
the QS will work pretty good for your requirement as Ray mentioned. You can even make this thing a QS only case unless you want to stream the data.

Posted: Mon Jan 31, 2005 7:47 pm
by ray.wurlod
The Transfomer stage with stage variables approach will work irrespective of source and target. The only requirement is that source is sorted.

thanks

Posted: Mon Jan 31, 2005 8:54 pm
by babbu9
Gentleman
Thank you for the advice. Appreciate it.

Babu