I am preparing a job for a new requirement in our Project.
I have to find out the Top 20 Customers for each Branch.
Now my source is a Seq File(~150 million rows) and it contains 4 Amount columns, Customer No and Branch No.
I have to add these Amount columns then find out the customers with most Amount balance.
So I have to use two Transformers
![Sad :(](./images/smilies/icon_sad.gif)
In the 2nd transformer, I am doing hash partition on Branch No and then sorting on Branch asc, Total Amount desc.
Then using a stage variable where I generate a counter and finally only those rows will go the output where the count is <= 20.
Any alternate logics will be appreciated.