Compare Duplicate Records in a File

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sharmaisha0902
Participant
Posts: 4
Joined: Fri Sep 02, 2011 9:56 am

Compare Duplicate Records in a File

Post by sharmaisha0902 »

I want to compare only the duplicate records in a file:

For eg

Key Start Date End Date

1 1/1/2010 1/1/2012
1 1/1/2011 1/1/2012

The data is sorted on key,Start Date in Ascending Order.

If EndDate of the First Record is greater than the StartDate of Second Record,
then reject both the records and capture it.

Please tell me how to achieve this.
Thanks,
Isha Sharma
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

What have you tried so far?

HINT: Set stage variables and compare
sharmaisha0902
Participant
Posts: 4
Joined: Fri Sep 02, 2011 9:56 am

Post by sharmaisha0902 »

Hi,


I have tried this


Sort the data on Key and Start Date in Input of transformer

if StgVar2=1 and in.StartDate<StgVar3 then FlgReject else FlgValid| StgVar4
in.EndDate | StgVar3
if in.Key=StgVar1 then 1 else 0 | StgVar2
in.Key | StgVar1

Is This Approach correct?
Thanks,
Isha Sharma
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Seems to me the first question back to you is - did it work? I'm guessing the answer is no, especially with this little wrinkle in your requirements:

If EndDate of the First Record is greater than the StartDate of Second Record, then reject both the records and capture it.

The rub here is the need to reject the first record only after it has been processed and you are looking at the second record. So it seems to me you can either learn about the Save/Fetch Input Records functions (if they are in your version and we are only talking about pairs of records) or go with the traditional fork join design. Meaning, one branch does the whole stage variable compare thing and outputs the key and something in the way of a reject indicator based on what it finds with regards to the date ranges. Then the main input streams through, joins to that by 'key' and decides what to do based on said reject indicator.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sharmaisha0902
Participant
Posts: 4
Joined: Fri Sep 02, 2011 9:56 am

Post by sharmaisha0902 »

Hi,

I used 4 stage variables KeyCheck,PrevKey,DtCheck and Enddt.
PrevKey=KeyCheck
in.Key=PrevKey
StgEndDt=DtCheck
in.EndDt=StgEndDt

Through applying constraint in transformer,I got the current record which is rejected.I then left join all Valid Records with this Reject Record and got both the records in rejected file.
Thanks,
Isha Sharma
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... this question was posted again on another forum with a slight tweak to the requirements:

If EndDate of the First Record is greater than the StartDate of Second Record, then reject the first record and capture it.

I don't have any time left this morning but putting it out there in case someone wants to take a stab at it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply