Page 1 of 1

Compare Duplicate Records in a File

Posted: Fri Apr 08, 2016 2:36 am
by sharmaisha0902
I want to compare only the duplicate records in a file:

For eg

Key Start Date End Date

1 1/1/2010 1/1/2012
1 1/1/2011 1/1/2012

The data is sorted on key,Start Date in Ascending Order.

If EndDate of the First Record is greater than the StartDate of Second Record,
then reject both the records and capture it.

Please tell me how to achieve this.

Posted: Fri Apr 08, 2016 7:28 am
by ShaneMuir
What have you tried so far?

HINT: Set stage variables and compare

Posted: Sat Apr 09, 2016 2:07 am
by sharmaisha0902
Hi,


I have tried this


Sort the data on Key and Start Date in Input of transformer

if StgVar2=1 and in.StartDate<StgVar3 then FlgReject else FlgValid| StgVar4
in.EndDate | StgVar3
if in.Key=StgVar1 then 1 else 0 | StgVar2
in.Key | StgVar1

Is This Approach correct?

Posted: Sat Apr 09, 2016 7:02 am
by chulett
Seems to me the first question back to you is - did it work? I'm guessing the answer is no, especially with this little wrinkle in your requirements:

If EndDate of the First Record is greater than the StartDate of Second Record, then reject both the records and capture it.

The rub here is the need to reject the first record only after it has been processed and you are looking at the second record. So it seems to me you can either learn about the Save/Fetch Input Records functions (if they are in your version and we are only talking about pairs of records) or go with the traditional fork join design. Meaning, one branch does the whole stage variable compare thing and outputs the key and something in the way of a reject indicator based on what it finds with regards to the date ranges. Then the main input streams through, joins to that by 'key' and decides what to do based on said reject indicator.

Posted: Mon Apr 11, 2016 7:17 am
by sharmaisha0902
Hi,

I used 4 stage variables KeyCheck,PrevKey,DtCheck and Enddt.
PrevKey=KeyCheck
in.Key=PrevKey
StgEndDt=DtCheck
in.EndDt=StgEndDt

Through applying constraint in transformer,I got the current record which is rejected.I then left join all Valid Records with this Reject Record and got both the records in rejected file.

Posted: Mon Apr 11, 2016 7:20 am
by chulett
So... this question was posted again on another forum with a slight tweak to the requirements:

If EndDate of the First Record is greater than the StartDate of Second Record, then reject the first record and capture it.

I don't have any time left this morning but putting it out there in case someone wants to take a stab at it.