Page 1 of 2

Required Output

Posted: Wed Jul 22, 2009 7:43 am
by balu536
Hi all,
I have a requirement which is as defined below.

i/p:

Col1 Col2 DATE Col4
100 01 2009 33
100 01 2008 24
100 01 2004 30
100 01 2002 15
200 01 2010 50
200 01 2008 21
200 01 2005 45
200 01 2003 20
200 01 2001 12
200 01 1999 10

For the above input i need to add two extra columns in such a way that the values are the next consecutive values under DATE column in source for a particular group(based on Col1).For the last records in that particular group,NULLS can be passed to the new added columns.

required o/p resembles as shown below:

Col1 Col2 DATE Col4 Col5 Col6
100 01 2009 2008 2004 33
100 01 2008 2004 2002 24
100 01 2004 2002 NULL 30
100 01 2002 NULL NULL 15
200 01 2010 2008 2005 50
200 01 2008 2005 2003 21
200 01 2005 2003 2001 45
200 01 2003 2001 1999 20
200 01 2001 1999 NULL 12
200 01 1999 NULL NULL 10


Please clarify whether this logic can be achieved with the stage variables in Transformer.If so guide me for the same.

Regards,
Balakrishna

Posted: Wed Jul 22, 2009 7:44 am
by balu536
Forgot to mention.Col4 and Col5 are the newly added column.

Posted: Wed Jul 22, 2009 7:48 am
by Sainath.Srinivasan
Sort it by key descending and use stage variables.

Posted: Wed Jul 22, 2009 7:50 am
by balu536
Sort operation is not required as the data in the input file is going to be in sorted order. Will you please explain the way how the stage variables can be used to achieve the logic.

Posted: Wed Jul 22, 2009 7:55 am
by priyadarshikunal
Sort in reverse order then use 3 stage variables out of which two should be to hold previous two values with a constraint to flush the values in those stage variables if the key value changes .

Posted: Wed Jul 22, 2009 7:57 am
by Sainath.Srinivasan
Sort IS required as we need the data by descending on your key(s).

Hint: You will need 4 stage variables.

Think about it and write a pseudocode. I will leave it to your thought process.

Posted: Wed Jul 22, 2009 7:59 am
by ArndW
This is a bit trickier to implement, since DataStage does not do "read-ahead" of rows, i.e. a given row cannot know what the next one is. Fortunately with stage variables you can store values from previous rows. So in this case you would collect year numbers and values in stage variables until the group change in column 1 occurs, at which time (again, using stage variables) you would build an output string of the format and contents you want. This would include your own column separators and line terminators and you would output the whole thing as one column - going to a sequential file or named pipe with no formatting. This can then be read in with the correct column definitions. The procedure isn't as complicated as the description makes it sound and it has also been explained several times in DSXChange, although mainly in the Server job section.

Posted: Wed Jul 22, 2009 8:03 am
by balu536
I doubt whether the stage variables work or not.Here we need to get the value under the DATE column for the next 2 consecutive records to pass to the two new columns.Any other method is highly appreciable.

Re: Required Output

Posted: Wed Jul 22, 2009 8:07 am
by Sainath.Srinivasan
balu536 wrote: Please clarify whether this logic can be achieved with the stage variables in Transformer.
I thought that is what you wanted.

Anyhow, did you give it a try? Do you have the pseudocode of what you want ?

Posted: Wed Jul 22, 2009 8:08 am
by ArndW
I think a sort wouldn't work here, because it would change the order of the YEAR columns. A pure DataStage solution with stage variables as mentioned above is possible but complex. Another possibility is to a add a line number column and then sort by descending line number, the you could store "previous" values of years in stage variables.

Posted: Wed Jul 22, 2009 8:20 am
by balu536
Sainath,
I have a solution with stage variables when sorted in ascending order

i/p:
Col1 Col2 DATE Col4
100 01 2002 15
100 01 2004 30
100 01 2008 24
100 01 2009 33
200 01 1999 10
200 01 2001 12
200 01 2003 20
200 01 2005 45
200 01 2008 21
200 01 2010 50

Stage Variable Utilisation:
-------------------------------
Expression Stage Variable
S2 S3
S1 S2
DATE S1

Column Derivation:
-------------------------------
Value Column Name
S1 Col3
S2 Col4
S3 Col5

This logic works only when the data is in Ascending order.

Can any one explain me for the descending order case?

Posted: Wed Jul 22, 2009 8:25 am
by Sainath.Srinivasan
That was what we were mentioning.

From me - descending => reverse order. Not descending by date.

Hope this clarifies.

Posted: Wed Jul 22, 2009 8:27 am
by balu536
Hi Srinivas,
As you are premium poster,i'm unable to view your entire solution posted


Regards,
Balakrishna

Posted: Wed Jul 22, 2009 8:30 am
by Sainath.Srinivasan
All I was trying to convey is your solution is what I had in my mind. You may have to add 1 more stage variable.

Posted: Wed Jul 22, 2009 8:33 am
by balu536
yeah one more stage variable is required inorder to perform this operation for each group.

But sainath is there any way to achieve this logic when the data is in descending order of DATE field(i.e the i/p data as i posted in my initial post).