Calculating Days

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ghostraider
Premium Member
Premium Member
Posts: 49
Joined: Wed Dec 13, 2006 6:31 pm

Calculating Days

Post by ghostraider »

Hi all, my data is in the following format.


Id Status Date
1 28 2007-09-24
1 5 2007-08-20
1 11 2007-09-15
1 27 2007-08-22
2 27 2007-09-22
2 5 2007-08-19
2 4 2007-08-20
2 1 2007-08-25



I need to calculate the number of days between the status changes. For example if status of Id 1 changes from 5 to 27, 27 to 28, 5 to 11 or any change from one status to another i have to calculate the number of days. The problem is the status can change in any order, so the ascending or descending order will not work. Could someone please give me an idea as to how i can work on this. I am not looking for the iconv or oconv stuff but how i can arrange the Status to calculate the days.

Thank You All
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Pre-sort by Id then by Status. That having been done, you can use stage variables in a Transformer stage to detect change, and Iconv() of the date(s) - which you can "remember" also using stage variables - to calculate the days difference.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sort by Id and then Date, status can come along for the ride it seems. Then the "iconv stuff" for the actual date difference in stage variables.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ghostraider
Premium Member
Premium Member
Posts: 49
Joined: Wed Dec 13, 2006 6:31 pm

Calculating Days

Post by ghostraider »

Is it possible after sorting if my data is in the following format.

ID Status Date
1 28 2007-08-20
1 27 2007-08-10
1 26 2007-07-10
1 25 2007-08-20
1 11 2007-07-15
1 10 2007-07-10
1 5 2007-07-05
1 4 2007-07-17
1 2 2007-07-12
1 1 2007-07-13

Now this is after i have sorted the ID in ascending and Status in descending orders. I need to calculate days for status change from 5 to 28, 5 to 27, 5 to 26, 5 to 25, 10 to 11, 4 to 5 (this pattern is different from the previous ones), 2 to 5, or 1 to 5. If the number of days is negative i should ignore. My question is even after sorting will the calculations be possible for all the status changes using stage variables. Please let me know.

Thank you!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes.

Though I'd be curious about what a negative interval actually means, in a business sense.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? So... not strictly chronological as I assumed where all you need is the number of days between any two status changes. Rather, for each Id you need a series of day counts between specific status changes? And when you said:

"I need to calculate days for status change from 5 to 28, 5 to 27, 5 to 26, 5 to 25, 10 to 11, 4 to 5 (this pattern is different from the previous ones), 2 to 5, or 1 to 5."

Did you mean "and 1 to 5", meaning as many of these as exist for a given Id? If so, with a finite number of possible status codes you care about per id (10 it seems) to store, I'd use a hashed file keyed by Id with a data field for each status holding a date. Then Read/Write to that same hashed file for each record, inserting if it doesn't exist yet for that Id and updating each time it does. In essence, a pivot.

When you are done, source from the hashed file as it will have all possible dates per Id at that point. Then calculate the date ranges for all valid status changes with dates.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ghostraider
Premium Member
Premium Member
Posts: 49
Joined: Wed Dec 13, 2006 6:31 pm

Post by ghostraider »

ray.wurlod wrote:Yes.

Though I'd be curious about what a negative interval actually means, in a business sense. ...

Ray, the negative interval has to be ignored. The dates entered by the users were not right and that is why the negative number of days.
ghostraider
Premium Member
Premium Member
Posts: 49
Joined: Wed Dec 13, 2006 6:31 pm

Post by ghostraider »

chulett wrote::? So... not strictly chronological as I assumed where all you need is the number of days between any two status changes. Rather, for each Id you need a series of day counts between specific status changes? And when you said:

"I need to calculate days for status change from 5 to 28, 5 to 27, 5 to 26, 5 to 25, 10 to 11, 4 to 5 (this pattern is different from the previous ones), 2 to 5, or 1 to 5."

Did you mean "and 1 to 5", meaning as many of these as exist for a given Id? If so, with a finite number of possible status codes you care about per id (10 it seems) to store, I'd use a hashed file keyed by Id with a data field for each status holding a date. Then Read/Write to that same hashed file for each record, inserting if it doesn't exist yet for that Id and updating each time it does. In essence, a pivot.

When you are done, source from the hashed file as it will have all possible dates per Id at that point. Then calculate the date ranges for all valid status changes with dates.
Chulett, sorry about that. It is "and 1 to 5". In fact i have to calculate days for 20 status changes and i have mentioned 10 only. Excuse me here, but could you please explain this "Then Read/Write to that same hashed file for each record, inserting if it doesn't exist yet for that Id and updating each time it does. In essence, a pivot" :?

Thanks a lot.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Meaning, the same hashed file as both a lookup and your target.

Code: Select all

        Hashed
           |
Source -> Trn -> Hashed -> [calc days]
Don't cache the lookup so the writes are immediately available. Every input record gets written to the hashed file. On a 'miss' you assemble a new record with all except one data field empty, on a 'hit' you write back all of the current values plus the new one in the proper slot.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Should also note that with this technique, sorting isn't necessary as input order doesn't matter.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply