I have a question regarding the use of sort functionality, be it in a sort stage or on the input of another stage, and the resulting error
Code: Select all
User inserted sort "rmv_duplicate_data.lnk_rmv_duplicate_data_Sort" does not fulfill the sort requirements of the downstream operator "rmv_duplicate_data"
To me this requirement seems flawed - I should be able to sort my data on those key fields AND any other fields to ensure that if I wish to retain the first or last record I can get the correct one in instance where you require a secondary key to determine which record should be kept?
eg If my key was an account number and it was followed by a sequence number and I wished to only keep the record with the highest sequence number then I would have to sort by both account number and sequence but deduplicate only on the account number?
Code: Select all
Account Seq
000001 01
000002 01
000003 01
000001 02
000002 02
Output:
000001 02
000002 02
000003 01
Any suggestions?