Hi
I was wondering whether there is a way to take out duplicate records in a flat file. The logic basically should be to error out all the duplicate records based on a key and pass unique records to oracle.
Logic
Flat file ------>identify duplicates-----> unique records to oracle
|
|
V
hash file with duplicate records
Please inform.
Thanks
Remove duplicate in Flat file
Moderators: chulett, rschirm, roy
Hi,
1. You can load records from flat file to a hash file.
In the hash file, define the key columns.
So when data is loaded to hash file, duplicate rows would be avoided.
2. Or you may load from flat file directly to Oracle table in which key columns have been defined.
The duplicate records will not be loaded due to integriy constraint in the target Oracle table.
Ketfos
1. You can load records from flat file to a hash file.
In the hash file, define the key columns.
So when data is loaded to hash file, duplicate rows would be avoided.
2. Or you may load from flat file directly to Oracle table in which key columns have been defined.
The duplicate records will not be loaded due to integriy constraint in the target Oracle table.
Ketfos
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You could pass the data through a QualityStage UNDUP operation.
A hashed file is a good way to remove duplicates based on key value, but is not appropriate otherwise.
If you are contemplating the need to remove duplicate rows, sort the input (use sort as a filter in the Sequential File stage) and use stage variables in a Transformer stage to detect whether you've seen the previous row before.
A hashed file is a good way to remove duplicates based on key value, but is not appropriate otherwise.
If you are contemplating the need to remove duplicate rows, sort the input (use sort as a filter in the Sequential File stage) and use stage variables in a Transformer stage to detect whether you've seen the previous row before.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
thank you, how to capture duplicate records
Ray
Thank you for the reply. I was wondering if there was a way for capturing duplicate records when source and target are flat files.
Thanks
Bob
I was wondering whether there is a way to take out duplicate records in a flat file. The logic basically should be to error out all the duplicate records based on a key and pass unique records to oracle.
Logic
Flat file ------>identify duplicates-----> unique records to oracle
|
|
V
hash file with duplicate records
Please inform.
Thanks
Thank you for the reply. I was wondering if there was a way for capturing duplicate records when source and target are flat files.
Thanks
Bob
Hiray.wurlod wrote:You could pass the data through a QualityStage UNDUP operation.
A hashed file is a good way to remove duplicates based on key value, but is not appropriate otherwise.
If you are contemplating the need to remove duplicate rows, sort the input (use sort as a filter in the Sequential File stage) and use stage variables in a Transformer stage to detect whether you've seen the previous row before.
I was wondering whether there is a way to take out duplicate records in a flat file. The logic basically should be to error out all the duplicate records based on a key and pass unique records to oracle.
Logic
Flat file ------>identify duplicates-----> unique records to oracle
|
|
V
hash file with duplicate records
Please inform.
Thanks
no QS
Hi
Thanks for the reply. The problem is we donot have quality stage. I have to do this just with Datastage 6. Any suggestions in this regard are welcome.
Babu
Thanks for the reply. The problem is we donot have quality stage. I have to do this just with Datastage 6. Any suggestions in this regard are welcome.
Babu
PilotBaha wrote:Babbu,
the QS will work pretty good for your requirement as Ray mentioned. You can even make this thing a QS only case unless you want to stream the data.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: