Page 1 of 1

Need help with pivoting columns

Posted: Mon Apr 06, 2009 4:49 am
by sagar deshmukh
Hi I have a condition where....
from input i am getting two columns as below :-

ID NAME
1 XX
1 YY
1 XY
2 SS
2 DD
2 PP

I WANT OUT PUT AS ----------->

ID NAME
1 XX#YY#XY
2 SS#DD#PP


HOW CAN I ACHIEVE THIS....???


I AM USING DS SERVER EDITION.

*Note: Title edited to be more descriptive - Content Editor*

Posted: Mon Apr 06, 2009 4:53 am
by Sainath.Srinivasan
Did you try searching?

There are loads of similar (or identical) posts.

General Answer: Use stage variables

Posted: Mon Apr 06, 2009 5:35 am
by hamzaqk
search for horizontal pivoting.............

Re: Need a help

Posted: Mon Apr 06, 2009 6:10 am
by kailas
Use the following unix script .This script will satisfy ur requirement. Make the output of this script i.e (outputFile.txt) as input in the data stage job.
Here InputFile.txt will be
ID NAME
1 XX
1 YY
1 XY
2 SS
2 DD
2 PP



awk -F ' *, *' ' END {
while (++i <= c) print order[i], row[order[i]]
}
{
row[$1] = row[$1] ? row[$1] OFS $2 : $2
while (!temp[$1]++) order[++c] = $1
}' InputFile.txt > outputFile.txt

Posted: Mon Apr 06, 2009 7:22 am
by ray.wurlod
Ur was a city in ancient Babylon.

The second person possessive pronoun in English is spelled "your", not "ur". Please strive for a professional standard of written English on DSXchange, because even good English is difficult enough for those whose first language is not English. Further, DSXchange is not an SMS-enabled device.

Posted: Mon Apr 06, 2009 1:54 pm
by DaleK
since we have limited number of programmers with UNIX scripting knowledge.

We use a hash file as a solution to this type of problem.

the flat file is read in and it looks up to a hash file that has column 1 as the key. If key found on hash file then we write out to the same hash file where the second column derivation is <Hasfile 1.column 2> : '#' : Input file.column2

Note the look up hash file is the same hash file as the output in this job.
we use an empty file in the job to create the empty lookup hash file

Works fine for us the couple of times we need to use it and the process was contained in the job so oncall support person could see what was going on since running unix scripts from datastage is somewhat hidden to the blurry eyed programmer at 3:00AM.

warning - this will be slower then running a unix script.

Posted: Tue Apr 07, 2009 6:12 am
by Sainath.Srinivasan
or simply sort the incoming data and append to existing using stage variable.

Something like

sort input -> tx -> hashed file

and in tx do
yourStgField = if (prevKey = Key)
then inField Else
yourStgField : delim : inField
where yourStgField is a stage variable.

Posted: Tue Apr 07, 2009 6:34 am
by sagar deshmukh
Hi Sainath,

How it will work whhen the key is changed?

Posted: Tue Apr 07, 2009 10:07 am
by Sainath.Srinivasan

Code: Select all

yourStgField = if (prevKey = Key) 
then inField Else 
yourStgField : delim : inField 

Posted: Tue Apr 07, 2009 12:00 pm
by girija
ID NAME
1 XX
1 YY
1 XY
2 SS
2 DD
2 PP

I WANT OUT PUT AS ----------->

ID NAME
1 XX#YY#XY
2 SS#DD#PP
=================================

transformer :

1. Define three stage variable

===> stgCount number intial value 0
===> stgApndVal vachar (...) initial value '~~~~~~~~~'
===> stgKey

2. Derivation

===> if link.ID <> stgKey THEN 1 ELSE stgCount + 1
===> if link.NAME <> stgKey THEN NAME ELSE stgApndVal:'#':NAME
===> link.ID

3. Add new output field APND_NAME and assign stgApndVal to it
4. Add new output field SRL_NO and assign stgCount to it

Code: Select all


    transformer -----> remove duplicate ---> target

remove duplicate stage partion same (since you already partion in ID at the input of transformer) but sort ID and SRL_NO , Key is ID and take the last.

Posted: Tue Apr 07, 2009 12:40 pm
by girija
Sorry, I won't see this is a Server Job.

Posted: Tue Apr 07, 2009 1:17 pm
by ray.wurlod
That's because you hijacked someone else's thread rather than starting a new topic for your new question.

Posted: Wed Apr 08, 2009 4:12 am
by sbass1
Sainath.Srinivasan wrote:or simply sort the incoming data and append to existing using stage variable.

Something like

sort input -> tx -> hashed file

and in tx do
yourStgField = if (prevKey = Key)
then inField Else
yourStgField : delim : inField
where yourStgField is a stage variable.
Sainath's reply gave me a few ideas...

My first attempt was to try doing the above without an intermediate hashed file. Instead, I used the RowProcGetPreviousValue and RowProcCompareWithPreviousValue transforms to detect when the key had changed.

The problems were:

1) When the key changed, I was always "one row ahead". I want to output when I'm on the last key in the group, not the first key in the next group.

2) This code didn't work for the last record, since I don't know the end-of-file until after the fact. See viewtopic.php?t=126410&highlight=

So I can see why Sainath uses a hashed file for output: unconditionally output all rows to the hashed file, overwriting each previous value with the concatenated stage variable, which is building up for each row. When the key changes, viola, your last.key and pivoted data was just saved to the hashed file.

However, I also decided to approach from a different tack...

I'm very familiar with SAS programming, which has the concept of first.key and last.key automatic variables. Since DS doesn't have these variables, I decided to construct them.

SAS also has the concept of "interleaving tables", which is a merge by row position rather than by keys. With this in mind, and by saving the Row Number to permanent storage, I was able to create "Look Ahead" and "Look Behind" variables (LookBehind is a bit redundant given RowProcGetPreviousValue). With these I can then derive first.key and last.key boolean flags. I can do this for sub-groups as well.

I've saved my proof of concept job (DSX file you can import) here:
http://docs.google.com/Doc?docid=dcdxxj ... v8dm&hl=en

I've saved the test data I used in this job here:
http://docs.google.com/Doc?docid=dcdxxj ... k3g5&hl=en. The test data is purposely stored unsorted.

This was really just a proof of concept, an academic exercise as much as anything. It might perform like a dog with large data volumes - I haven't tested it. But, it does solve the original poster's problem, and perhaps is a generic approach to first.key and last.key processing.

Or I may have just reinvented a perfectly round wheel if I'd searched the forums a bit harder :wink:

HTH...

Posted: Wed Apr 08, 2009 4:59 am
by Sainath.Srinivasan
I have not been through your links so far. But do very much appreciate your effort.

Alternatively you can append a dummy line to incoming file, locating the presence of which in your input stream will determine end-of-file.