Combine Dates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
c62dr08
Premium Member
Premium Member
Posts: 9
Joined: Mon Mar 05, 2007 1:47 pm

Combine Dates

Post by c62dr08 »

I have an input file: Key, EffDt, ExpDt
The data looks like
1111 2002-01-01 2002-07-01
1111 2002-07-01 2003-01-01
1111 2003-01-01 2004-01-01
2222 2002-01-01 2003-01-01
2222 2003-01-01 2004-01-01
2222 2006-01-01 2007-01-01

I need the final result looks like
1111 2002-01-01 2004-01-01
2222 2002-01-01 2004-01-01
2222 2006-01-01 2007-01-01

I hashed input file by Key and sorted it by Key and EffDt. After that I used transform. The output looks good, but last records were not written (or there were written with the Key = " ".
Is it any other technique appropriate for this process? Please help!!!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sort your input by all three columns in the order they appear. Create a transform stage which stores the lastKey, MinimumEffDt and MaximumExpDt as stage variables. Put a constraint to output a row only when the Key changes and output the stored values.
N.B. You will need to add a dummy row to the input stream in order to ensure that the last value is output as well.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You only need to partition on Key. Including EffDt in the partitioning may lead to inconsistent results. Though you still may want to sort on EffDt.

Could you, instead, use an Aggregator stage, grouping on Key and generating Min(EffDt) and Max(ExpDt) ?

The same partitioning considerations as above apply.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Re: Combine Dates

Post by sachin1 »

hello dear, i am little doubtful about your output data, if you could be little precise on output data, we can find some proper solution.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Re: Combine Dates

Post by JoshGeorge »

You can get desired out put if you use a combination of Sort stage, Transformer stage and Remove Duplicate stage. In sort stage hash partition and sort the incoming records by Key. In sort properties, specify 'key' as previously sorted and specify EffDt and ExpDt to be sorted in asc order. Also set cluster key change column as true and map all these columns to the output. In the transformer assign stage variables:

1) SvEffDt --> if In.clusterKeyChange = 1 then In.EffDt Else SvEffDt
2) SvPrevSvEffDt --> SvCurrEffDt
3) SvCurrEffDt --> In.EffDt
4) SvPrevSvExpDt --> SvCurrExpDt
5) SvCurrExpDt --> In.ExpDt
6) SvNewRowInd --> if In.clusterKeyChange = 1 then SvNewRowInd+1 Else if CurrStDt <> PrevEndDt Then SvNewRowInd+1 Else SvNewRowInd

In Transformer main derivation part:
In.Key --> Out.Key
SvEffDt --> Out.EffDt
In.ExpDt --> Out.ExpDt
RowNum --> SvNewRowInd

Now in remove duplicate stage specify key as RowNum and retain last.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

:? Did I miss something with my suggestion to use an Aggregator stage? It looks very much like SELECT Key, Min(EffDt), Max(EndDt) FROM input GROUP BY Key to me.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

That would be :

1111 2002-01-01 2004-01-01
2222 2002-01-01 2007-01-01

What OP posted in requirement is :

1111 2002-01-01 2004-01-01
2222 2002-01-01 2004-01-01
2222 2006-01-01 2007-01-01
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

a simple solution as suggested by ArndW
Sort your input by all three columns in the order they appear
and then have this stage variables:

1) svNewEffDt---> If PrevExpDt<=CurrExpDt Then In.EffDt Else svNewEffDt
2) PrevExpDt=CurrExpDt
3) CurrEffDt = In.EffDt
4) CurrExpDt =In.ExpDt

5) Prevkey=CurrKey
6) CurrKey = In.key

in contraint, condition is
(PrevExpDt<=CurrExpDt) or ( Prevkey=CurrKey )
c62dr08
Premium Member
Premium Member
Posts: 9
Joined: Mon Mar 05, 2007 1:47 pm

Re: Combine Dates

Post by c62dr08 »

JoshGeorge wrote:You can get desired out put if you use a combination of Sort stage, Transformer stage and Remove Duplicate stage. In sort stage hash partition and sort the incoming records by Key. In sort properties, specify 'key' as previously sorted and specify EffDt and ExpDt to be sorted in asc order. Also set cluster key change column as true and map all these columns to the output. In the transformer assign stage variables:

1) SvEffDt --> if In.clusterKeyChange = 1 then In.EffDt Else SvEffDt
2) SvPrevSvEffDt --> SvCurrEffDt
3) SvCurrEffDt --> In.EffDt
4) SvPrevSvExpDt --> SvCurrExpDt
5) SvCurrExpDt --> In.ExpDt
6) SvNewRowInd --> if In.clusterKeyChange = 1 then SvNewRowInd+1 Else if CurrStDt <> PrevEndDt Then SvNewRowInd+1 Else SvNewRowInd

In Transformer main derivation part:
In.Key --> Out.Key
SvEffDt --> Out.EffDt
In.ExpDt --> Out.ExpDt
RowNum --> SvNewRowInd

Now in remove duplicate stage specify key as RowNum and retain last.
Thank you Josh. It works.
the only thing i changed was:
If SvCurrEffDt <> SvPrevSvExpDt Then SvCurrEffDt Else SvEffDt --> Out.EffDt :lol:
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

keshav0307's solution won't work because

(PrevExpDt<=CurrExpDt) or ( Prevkey=CurrKey ) won't restrict records to pass the way you want. You will get all the first 5 records.
keshav0307 wrote:a simple solution as suggested by ArndW
Sort your input by all three columns in the order they appear
and then have this stage variables:

1) svNewEffDt---> If PrevExpDt<=CurrExpDt Then In.EffDt Else svNewEffDt
2) PrevExpDt=CurrExpDt
3) CurrEffDt = In.EffDt
4) CurrExpDt =In.ExpDt

5) Prevkey=CurrKey
6) CurrKey = In.key

in contraint, condition is
(PrevExpDt<=CurrExpDt) or ( Prevkey=CurrKey )
Last edited by JoshGeorge on Fri Jun 15, 2007 8:32 pm, edited 1 time in total.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you are going to bother to come here and post something like that, at least explain why you believe that to be true. Convict, don't just indict.
-craig

"You can never have too many knives" -- Logan Nine Fingers
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Replied / Edited inline :lol: Pointing out the obvious from the code is not indicting. :roll:
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
Post Reply