I have an input dataset consists of order history based on CustID and Order Date. Using Remove Duplicates, we want to select the most recent order for a given customer.
We did the following in aggregator stage but getting all input records as output.No duplicate removal has been performed.
Partition on CustID to group related records
Sort on OrderDate in Descending order
Remove Duplicates on CustID, with Duplicate To Retain=First
Please help.
Thanks
Abhik
Aggregator stage problem
Moderators: chulett, rschirm, roy
By using sort stage and transfomer we can do as below
1)CustID-sorting and paritition[hash]
2)Order Date-only sorting with desc
In sort stage use key column as CustID and use keychangecolumn=True and in transformer use constraint keychangecolumn=1
Or using
aggregator stage alone
1)CustID-sorting and paritition[hash]
2)grouping key-CustID
aggregator type-calculation
column for calculation-Order Date
maximum value output column-result
1)CustID-sorting and paritition[hash]
2)Order Date-only sorting with desc
In sort stage use key column as CustID and use keychangecolumn=True and in transformer use constraint keychangecolumn=1
Or using
aggregator stage alone
1)CustID-sorting and paritition[hash]
2)grouping key-CustID
aggregator type-calculation
column for calculation-Order Date
maximum value output column-result
May i know which method you tried among these two?
Method 1:
By using sort stage and transfomer we can do as below
1)CustID-sorting and paritition[hash]
2)Order Date-only sorting with desc
In sort stage use key column as CustID and use keychangecolumn=True and in transformer use constraint keychangecolumn=1
Or using
Method 2:
aggregator stage alone
1)CustID-sorting and paritition[hash]
2)grouping key-CustID
aggregator type-calculation
column for calculation-Order Date
maximum value output column-result
Method 1:
By using sort stage and transfomer we can do as below
1)CustID-sorting and paritition[hash]
2)Order Date-only sorting with desc
In sort stage use key column as CustID and use keychangecolumn=True and in transformer use constraint keychangecolumn=1
Or using
Method 2:
aggregator stage alone
1)CustID-sorting and paritition[hash]
2)grouping key-CustID
aggregator type-calculation
column for calculation-Order Date
maximum value output column-result