Rows to column split in Datastage.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Rows to column split in Datastage.

Post by sam334 »

All,
Need one help on rows to column spilling in datastage. It is a parallel environment where source is db2 database. Basically we have four columns,

userid,begiinigdatettime,status.endingdatetime,duration.
Sam,2014-04-12 7.00am, on call,2014-04-12,7.30am, 1800
Sam,2014-04-12 7.30 am,busy,2014-04-12,8.00am,1800
Sam,2014-04-12 8.00 am, free to talk,2014-04-12,8.15am,900

- it will be going on until the user logged out.

We basically want the data looks like,

userid,beginningdatetime,endingdatettime,on call,busy,free to talk.
Sam, 2014-04-12 7.00am,2014-04-12,8.15am,00.30.00,00.30.00,00.15.00

Any help what kind of stage can we use to achieve it. I used pivot enterprise, but its getting more critical on it.

Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Pivot Enterprise would seem to me to be the correct stage to use. What's your actual problem?

Without creating a custom (Build) stage, you might be able to leverage the looping capability in the Transformer stage.

In all of the above, I assume you are processing after the event. What you seem to be trying to do would be very difficult to accomplish in real time.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Post by sam334 »

Thanks Ray. It is quite tough for sure.

In Pivot stage, if use array of 3 for on call,busy,free to talk. its repeating the same for next other rows which does not make sense. )pivoting on status.

Its coming like,
userid,beginningdatetime,endingdatettime,status1,status2,status3
sam,2014-04-12 7.00am,2014-04-12,7.30am,on call,busy,free to talk
sam,2014-04-12 8.30 am,2014-04-9.00,on break,available,on call
sam,2014-04-12 9.00am, 2014-04-12 10.00am, busy,on call, free to talk

if I pivot on beginningdatettime and enddatetime as well, then will get one extra column like,

userid,beginningdatetime1,status1,enddatetime1,beginningdatetime2,status2,enddatetime2,beginningdatetime3,status3,enddatetime3

Probably we need to group by the status which then calculating time would be easier.
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

In filter stage split in status. So on call, busy, free to talk will go on three links.
In another filter link use condition (on call or busy or free to talk) and send userid down that link. Three rows will go down this link. Then in remove duplicate stage dedup it to only one row Sam.
Then join this link with the earlier links for on call, busy and free to talk base on userid. Then the data would be in one row.
You can then do the calculation for endingdatetime_busy - endingdatetime_oncall, endingdatetime_freetotalk - endingdatetime_busy.
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Post by sam334 »

Thanks Guys. So far I divided the column in SQL query in source data grouped by status and userid.
So, it looks like, userid,hour,minute,seconds,status and date. Now i can try pivoting the data.

Will also try your suggestions for sure.

Thanks,
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Post by sam334 »

Okay,I tried this way. So, I changed the sql query and it is pulling data like,
userid,time_spent,status,date
sam,0:15:12,Free to talk,2013-04-20
sam,1:15:22,Busy,2013-04-20
sam,00:47:15,available,2013-04-20
sam,04:15:11,out of desk,2013-04-20
jim,02:10:15,busy.2013-04-20
jim,00:15:10,free to talk,2013-04-20

Now, in final out it should look like,

userid,free to talk,busy,available,outofdesk

sam,00:15:12,01:15:22,,00:47:15,00:00:00
jim,00:15:10,02:10:15,00:00:00,00:00:00

My only doubt is , can we achieve this with PIVOT enterprise. Kind of confused....

Thanks again.


ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You would need a Transformer to calculate the intervals, then you could use a Pivot stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply