converting rows to columns

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bobby
Participant
Posts: 86
Joined: Mon Jul 19, 2004 8:31 pm

converting rows to columns

Post by bobby »

Hi ,
I have input
1 n0
1 n1
1 n2
2 b1
2 b2
2 b3


output

1 no, n1,n2
2 b1,b2,b3

how can i achieve that.
Thanks,
Bobby
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Aggregator is easiest, stage variables can be lengthy and you have that pesky problem of the last row to do with. Setup a transformer that has an output link to an aggregator. Your output link has four columns. For each row, map the source column to the appropriate target column. If that row is not for the given column, put in a zero.

In the aggregator, group by the key and do a max derivation for each of the three columns.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You can use multi-valued hash-file with first column as its key.

Alternatively, you can store each record into a temp table with commit set to every (1) record, which you then lookup for next record and concatenate to the previous.

I think pivot is only for columns to rows.
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Post by talk2shaanc »

Use of aggregator is the best and simplest way as Kcbland has suggested. Use of hash file or a database is an extra overhead and its not advisable as its not a good design.
As you can think of if your source file size is huge, then your going to eat up lots of space of your server/Database if you have an intermediate staging area(hash/database) and then again do a join. Here in your example you have given only 1+3 columns. Think of if your no of column is 40-50. you will have 40-50 hash files/DataBase's and that many joins. Your server will shout at you :evil: and go mad :x .
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

By-the-way, to do any number of columns - including upto 1 million columns and more - you need only ONE hash-file or database table by the method of hash-files or database table.

All you are going to do is read-and-write/update the same object to form a string.

The advantage by this method is that you need not worry whether there can be more than 3 entries for one key at any point of time.

The only constraint is the size your db / hash-file can hold.

If you need a detailed explanation of the process, do let me know.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Maybe if you are going down the agg route, you can follow an extra tip of using a stage variable for description and Key as follows.

stgDesc default '' (blank)
stgKey default @NULL (NULL)

stgDesc <- If linkKey = stgKey then stgDesc : ',' : linkDesc Else if IsNULL(stgKey) Then linkDesc Else linkDesc
stgKey = linkKey

The IsNULL is more for belt-and-braces.

You can pass ALL incoming rows to the agg. In the agg stage, you group by the key and use the LAST function for the description.

By this way, you will get the key + full concatenated list of desc.
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

Post by battaliou »

You could use BASIC:

Code: Select all

openseq 'in.txt' to f else return
openseq 'out.txt' to t else return
loop
 out = ''
 for i = 1 to 3
  readseq rec from f else return
  if out = '' then out = rec else out := ',':field(rec,' ',2)
 next i
 writeseq out to t else return
until 1=2 do repeat

Post Reply