Finding most recent effectitive salary date

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
srini.dw
Premium Member
Premium Member
Posts: 186
Joined: Fri Aug 18, 2006 1:59 am
Location: Chennai

Finding most recent effectitive salary date

Post by srini.dw »

Hi,

I need to findout most recent effective date.
Emp No Date Salary
100007 ^1999-11-01^ 000000000002152.57
100007 ^2000-11-01^ 000000000002152.57
100007 ^2001-01-01^ 000000000002152.57
100007 ^2001-11-01^ 000000000002152.57
100007 ^2002-11-01^ 000000000002152.57
100007 ^2003-04-01^ 000000000002152.57
100007 ^2003-09-01^ 000000000002152.57
100007 ^2003-11-01^ 000000000002152.57




10004 ^1998-04-01^ 000000000012500.00
10004 ^1999-04-01^ 000000000012500.00
10004 ^2000-04-01^ 000000000012500.00
10004 ^2000-07-01^ 000000000013350.00
10004 ^2000-09-01^ 000000000013350.00
10004 ^2000-09-01^ 000000000013350.00
10004 ^2001-02-06^ 000000000013350.00
10004 ^2001-02-28^ 000000000013350.00
10004 ^2001-04-01^ 000000000013350.00
10004 ^2001-08-31^ 000000000013350.00
10004 ^2002-01-01^ 000000000013350.00

Thanks
panchusrao2656
Charter Member
Charter Member
Posts: 64
Joined: Sat Sep 17, 2005 10:42 am

Post by panchusrao2656 »

In Sort Stage,hash partition the data based on EmployeeID and sort the data by Date descending, please generate the key change indicator which can be used in the transformer/filter to pick the right salary and date. The other option is to extract the recent by writing the proper query.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? And? What exactly have you tried? What are the bold values supposed to represent? They don't seem to be the 'most recent' effective dates in the bunches, hence the question...
-craig

"You can never have too many knives" -- Logan Nine Fingers
srini.dw
Premium Member
Premium Member
Posts: 186
Joined: Fri Aug 18, 2006 1:59 am
Location: Chennai

Post by srini.dw »

I need to findout most recent salary effective date.

For based on Emp no,Effective Date,Salary.

Emp no Eff Date Salary
100007 ^ 1999-11-01 ^ 000000000002152.57
100007 ^2000-11-01^ 000000000002152.57
100007 ^2001-01-01^ 000000000002152.57
100007 ^2001-11-01^ 000000000002152.57
100007 ^2002-11-01^ 000000000002152.57
100007 ^2003-04-01^ 000000000002152.57
100007 ^2003-09-01^ 000000000002152.57
100007 ^2003-11-01^ 000000000002152.57

result:1999-11-01

10004 ^1998-04-01^ 000000000012500.00
10004 ^1999-04-01^ 000000000012500.00
10004 ^2000-04-01^ 000000000012500.00
10004 ^ 2000-07-01 ^ 000000000013350.00
10004 ^2000-09-01^ 000000000013350.00
10004 ^2000-09-01^ 000000000013350.00
10004 ^2001-02-06^ 000000000013350.00
10004 ^2001-02-28^ 000000000013350.00
10004 ^2001-04-01^ 000000000013350.00
10004 ^2001-08-31^ 000000000013350.00
10004 ^2002-01-01^ 000000000013350.00

result :2000-07-01
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Do a group by on Id and Salary and get the MIN() of date.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Huh... I guess our definitions of the phrase 'most recent' differ. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

chulett wrote:Huh... I guess our definitions of the phrase 'most recent' differ. :?
:lol:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The example showed least recent salary date. Glad I'm not covered but THAT payroll system! For most recent, of course, use MAX() rather than MIN().
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Looks like it's the earliest effective date of the most recent salary. This makes it tricky as a persons salary may have gone up and down. Eg.

Code: Select all

10001  1999-01-01  000210000
10001  2001-01-01  000240000
10001  2003-01-01  000210000
In this case no combination of aggregation, sort and grouping will give you the right answer. You will probably get the right salary (21000) but the wrong date (1999-01-01).

I think you should sort on EMP and DATE and send it to a transformer. Use Stage Variables to compare current row to previous row. Output the first row, read the second row, if EMP ID and SALARY is the same ignore the row and move to the next row, otherwise output the row. This will give you a unique start date and salary for every employee.

To find the most recent salaries do a remove duplicates on EMP taking the last record (most recent salary).
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

In other words, sort by Id and Date, pass it via aggregator, group by Id and Salary, retaining MIN() date, pass it via remove duplicate stage and retain last record (most recent salary).
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply