Logic to get the ID with last date

taral · Post by **taral** » Wed Jun 23, 2010 12:54 am

Have a sequential file
Columns:
EMP_ID EMP_SAL EMP_DATE
1 100 01/01/2009
4 200 01/01/2005
2 300 01/01/2011
2 400 01/01/2009
1 100 01/01/2010
3 200 01/01/2009
4 100 01/01/2007
3 300 01/01/2006
.
.
.
Output should contain the the emp_ID who has the latest date

ray.wurlod · Post by **ray.wurlod** » Wed Jun 23, 2010 1:24 am

Sort by date and capture the last row (sort and tail commands).
Sort by date in descending order and capture the first row (sort and head commands).
Because this is a parallel job, you do have MKS Toolkit installed and can therefore use UNIX commands such as sort, tail and head.

taral · Post by **taral** » Wed Jun 23, 2010 2:27 am

The out should contain Emp_id which contains latest date.
And what is head/tail commands?

devesh_ssingh · Post by **devesh_ssingh** » Wed Jun 23, 2010 2:38 am

Head and tail are unix commands.

head -n 1 filename--display 1st line of file
tail -n 1 filename----display last line of file

taral · Post by **taral** » Wed Jun 23, 2010 2:50 am

The input sequential file contains same emp_id but can have different date. The output should have single row containing emp_id, sal , date which is latest.
Looking at the example it should have output dataset as:
EMP_ID EMP_SAL EMP_DATE
1 100 01/01/2010
2 300 01/01/2011
3 200 01/01/2009
4 100 01/01/2007

ray.wurlod · Post by **ray.wurlod** » Wed Jun 23, 2010 4:54 am

Please re-read my earlier post. It contains two solutions.

taral · Post by **taral** » Thu Jul 22, 2010 2:45 am

did a sorting on both the fields emp_id and emp_date(secondary sort key)
then after use a remove duplicate stage (emp_id as a key column) to remove the duplicate.