Page 1 of 1

Combine Dates

Posted: Wed Jun 13, 2007 10:13 am
by c62dr08
I have an input file: Key, EffDt, ExpDt
The data looks like
1111 2002-01-01 2002-07-01
1111 2002-07-01 2003-01-01
1111 2003-01-01 2004-01-01
2222 2002-01-01 2003-01-01
2222 2003-01-01 2004-01-01
2222 2006-01-01 2007-01-01

I need the final result looks like
1111 2002-01-01 2004-01-01
2222 2002-01-01 2004-01-01
2222 2006-01-01 2007-01-01

I hashed input file by Key and sorted it by Key and EffDt. After that I used transform. The output looks good, but last records were not written (or there were written with the Key = " ".
Is it any other technique appropriate for this process? Please help!!!

Posted: Wed Jun 13, 2007 3:28 pm
by ArndW
Sort your input by all three columns in the order they appear. Create a transform stage which stores the lastKey, MinimumEffDt and MaximumExpDt as stage variables. Put a constraint to output a row only when the Key changes and output the stored values.
N.B. You will need to add a dummy row to the input stream in order to ensure that the last value is output as well.

Posted: Wed Jun 13, 2007 4:32 pm
by ray.wurlod
You only need to partition on Key. Including EffDt in the partitioning may lead to inconsistent results. Though you still may want to sort on EffDt.

Could you, instead, use an Aggregator stage, grouping on Key and generating Min(EffDt) and Max(ExpDt) ?

The same partitioning considerations as above apply.

Re: Combine Dates

Posted: Thu Jun 14, 2007 12:19 am
by sachin1
hello dear, i am little doubtful about your output data, if you could be little precise on output data, we can find some proper solution.

Re: Combine Dates

Posted: Thu Jun 14, 2007 3:37 am
by JoshGeorge
You can get desired out put if you use a combination of Sort stage, Transformer stage and Remove Duplicate stage. In sort stage hash partition and sort the incoming records by Key. In sort properties, specify 'key' as previously sorted and specify EffDt and ExpDt to be sorted in asc order. Also set cluster key change column as true and map all these columns to the output. In the transformer assign stage variables:

1) SvEffDt --> if In.clusterKeyChange = 1 then In.EffDt Else SvEffDt
2) SvPrevSvEffDt --> SvCurrEffDt
3) SvCurrEffDt --> In.EffDt
4) SvPrevSvExpDt --> SvCurrExpDt
5) SvCurrExpDt --> In.ExpDt
6) SvNewRowInd --> if In.clusterKeyChange = 1 then SvNewRowInd+1 Else if CurrStDt <> PrevEndDt Then SvNewRowInd+1 Else SvNewRowInd

In Transformer main derivation part:
In.Key --> Out.Key
SvEffDt --> Out.EffDt
In.ExpDt --> Out.ExpDt
RowNum --> SvNewRowInd

Now in remove duplicate stage specify key as RowNum and retain last.

Posted: Thu Jun 14, 2007 5:53 am
by ray.wurlod
:? Did I miss something with my suggestion to use an Aggregator stage? It looks very much like SELECT Key, Min(EffDt), Max(EndDt) FROM input GROUP BY Key to me.

Posted: Thu Jun 14, 2007 6:13 am
by JoshGeorge
That would be :

1111 2002-01-01 2004-01-01
2222 2002-01-01 2007-01-01

What OP posted in requirement is :

1111 2002-01-01 2004-01-01
2222 2002-01-01 2004-01-01
2222 2006-01-01 2007-01-01

Posted: Thu Jun 14, 2007 7:42 am
by keshav0307
a simple solution as suggested by ArndW
Sort your input by all three columns in the order they appear
and then have this stage variables:

1) svNewEffDt---> If PrevExpDt<=CurrExpDt Then In.EffDt Else svNewEffDt
2) PrevExpDt=CurrExpDt
3) CurrEffDt = In.EffDt
4) CurrExpDt =In.ExpDt

5) Prevkey=CurrKey
6) CurrKey = In.key

in contraint, condition is
(PrevExpDt<=CurrExpDt) or ( Prevkey=CurrKey )

Re: Combine Dates

Posted: Thu Jun 14, 2007 10:38 am
by c62dr08
JoshGeorge wrote:You can get desired out put if you use a combination of Sort stage, Transformer stage and Remove Duplicate stage. In sort stage hash partition and sort the incoming records by Key. In sort properties, specify 'key' as previously sorted and specify EffDt and ExpDt to be sorted in asc order. Also set cluster key change column as true and map all these columns to the output. In the transformer assign stage variables:

1) SvEffDt --> if In.clusterKeyChange = 1 then In.EffDt Else SvEffDt
2) SvPrevSvEffDt --> SvCurrEffDt
3) SvCurrEffDt --> In.EffDt
4) SvPrevSvExpDt --> SvCurrExpDt
5) SvCurrExpDt --> In.ExpDt
6) SvNewRowInd --> if In.clusterKeyChange = 1 then SvNewRowInd+1 Else if CurrStDt <> PrevEndDt Then SvNewRowInd+1 Else SvNewRowInd

In Transformer main derivation part:
In.Key --> Out.Key
SvEffDt --> Out.EffDt
In.ExpDt --> Out.ExpDt
RowNum --> SvNewRowInd

Now in remove duplicate stage specify key as RowNum and retain last.
Thank you Josh. It works.
the only thing i changed was:
If SvCurrEffDt <> SvPrevSvExpDt Then SvCurrEffDt Else SvEffDt --> Out.EffDt :lol:

Posted: Fri Jun 15, 2007 7:30 pm
by JoshGeorge
keshav0307's solution won't work because

(PrevExpDt<=CurrExpDt) or ( Prevkey=CurrKey ) won't restrict records to pass the way you want. You will get all the first 5 records.
keshav0307 wrote:a simple solution as suggested by ArndW
Sort your input by all three columns in the order they appear
and then have this stage variables:

1) svNewEffDt---> If PrevExpDt<=CurrExpDt Then In.EffDt Else svNewEffDt
2) PrevExpDt=CurrExpDt
3) CurrEffDt = In.EffDt
4) CurrExpDt =In.ExpDt

5) Prevkey=CurrKey
6) CurrKey = In.key

in contraint, condition is
(PrevExpDt<=CurrExpDt) or ( Prevkey=CurrKey )

Posted: Fri Jun 15, 2007 7:50 pm
by chulett
If you are going to bother to come here and post something like that, at least explain why you believe that to be true. Convict, don't just indict.

Posted: Fri Jun 15, 2007 8:35 pm
by JoshGeorge
Replied / Edited inline :lol: Pointing out the obvious from the code is not indicting. :roll: