Concatenation of Rows

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

sengs
Participant
Posts: 55
Joined: Thu Nov 24, 2005 12:51 am

Concatenation of Rows

Post by sengs »

Hi,
I have a requirement of concatenation of values at different rows of a single column.How this can be done in DS.
For Example

NO ........... TYPE
1 ....................A
1.................... B
1.................... C
1.................... D

At the end my requirement is,
1.........................A,B,C,D

Please advice.
Thanks,
Sengs
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

Hi sengs,

this is often referred to in these forums using the term "vertical pivot". Search the forum and see what you can find. Another search option to try is "man paste" (Craig - don't go there!).

David.
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

Requirement:
Vertical pivot based on the value of a pivot field (in this case NUM)
Data file has two columns - NUM, TYPE (although the solution can be tailored for more)


Assumptions:
Number of records with the same value in pivot field is unknown
Pivot field cannot be an empty string


Solution:

Code: Select all

      L1             L2
SEQ---------->TFM---------->HSH



Define sequential file with two columns
NUM varchar 10
TYPE varchar 10


Define Transform as follows

Stage Variables
currentKey
   Initial value = ""
   Derivation = L1.NUM
newRecord
   Initial value = ""
   Derivation = if currentKey <> lastKey Then L1.NUM:",":L1.TYPE else newRecord:",":L1.TYPE
lastKey
   Initial value = ""
   Derivation = currentKey

L2 Deriviations
L2.key = L1.NUM
L2.line = newRecord


Define hash file with two columns
key varchar 10 (marked as the key)
line varchar 200

Description
As each record is read, the value of the pivot field is compared to the previous value.
If the values are different, the newRecord variable is initialised with the NUM and type field in CSV format
If the values are the same, the type field is appended to the newRecord variable in CSV format
Each record is written to the hash file and because the pivot field is used as the key, every record with the same value in the pivot field will overwrite the previous record.

For a file with the following values:

Code: Select all

1,A
1,B
1,C
1,D
2,A
The result is a hash file that looks like this:

Code: Select all

1   1,A,B,C,D
2   2,A
Jim Paradies
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

D wrote:(Craig - don't go there!)
Doh! Cut off at the knees. Ok, not going there. :lol:
-craig

"You can never have too many knives" -- Logan Nine Fingers
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Jim, you should post that to the FAQ forum as the solution for a vertical pivot requirement.
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

Vince,

I'm afraid I'm not privileged enough to do that.

Code: Select all

Sorry, but only users granted special access can post topics in this forum.
Maybe you could do it for me!
Jim Paradies
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

By the way, I forgot to list one important assumption.

The data has to be sorted by the pivot field.
Jim Paradies
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Roy! Could you wave your magic wand at Jim and make him a FAQ Genie? His vertical pivot answer beats the FAQ out of the previous responses for completeness.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you use field mark (@FM) rather than comma, your data will magically be in multiple columns in the hashed file even though you only load the key and field #1. You can then read from the hashed file with a full set of column definitions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

Ray,
If you use field mark (@FM) rather than comma, your data will magically be in multiple columns in the hashed file even though you only load the key and field #1.
I assume that to do this, you would have to construct a hash file with the correct number of columns first. I tried to simply change the commas to @FM and I got the following result for the same input

Code: Select all

1,1
2,2
Jim Paradies
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Aha, that's what you got with View Data.

Add more column definitions - even to the metadata in the Columns grid, and look again!

That is, the Hashed File stage has the ability to populate undefined columns, in exactly the same way that the Repository database tables contain undefined columns that are, nonetheless, accessible. But not by name - columns in hashed files are accessed by ordinal storage position in the record.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

Absolutely right!

I added new columns to the hash file stage and without re-running the job, I could view the data.
Jim Paradies
sheshadri
Participant
Posts: 11
Joined: Mon Jul 24, 2006 4:17 am
Location: any ware

Re: Concatenation of Rows

Post by sheshadri »

sengs wrote:Hi,
Hi fr's thanx to send very nice information.
i got answer to my typical problem.

NO ........... TYPE
1 ....................A
1.................... B
1.................... C
1.................... D

At the end my requirement is,
1.........................A,B,C,D

Please advice.
Thanks,
Sengs
SHESHU
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Is this a question, reply, some kind of graffiti or... oh, I see - a spirit guide for the OP, returned to let us know all is well almost 11 months later. :wink:

Someone needs to learn how to quote more better. That or just use the Reply to Topic button. 8)
-craig

"You can never have too many knives" -- Logan Nine Fingers
nvuradi
Participant
Posts: 54
Joined: Tue Dec 12, 2006 11:03 pm

Post by nvuradi »

jzparad wrote:Requirement:
Vertical pivot based on the value of a pivot field (in this case NUM)
Data file has two columns - NUM, TYPE (although the solution can be tailored for more)


Assumptions:
Number of records with the same value in pivot field is unknown
Pivot field cannot be an empty string


Solution:

Code: Select all

      L1             L2
SEQ---------->TFM---------->HSH



Define sequential file with two columns
NUM varchar 10
TYPE varchar 10


Define Transform as follows

Stage Variables
currentKey
   Initial value = ""
   Derivation = L1.NUM
newRecord
   Initial value = ""
   Derivation = if currentKey <> lastKey Then L1.NUM:",":L1.TYPE else newRecord:",":L1.TYPE
lastKey
   Initial value = ""
   Derivation = currentKey

L2 Deriviations
L2.key = L1.NUM
L2.line = newRecord


Define hash file with two columns
key varchar 10 (marked as the key)
line varchar 200

Description
As each record is read, the value of the pivot field is compared to the previous value.
If the values are different, the newRecord variable is initialised with the NUM and type field in CSV format
If the values are the same, the type field is appended to the newRecord variable in CSV format
Each record is written to the hash file and because the pivot field is used as the key, every record with the same value in the pivot field will overwrite the previous record.

For a file with the following values:

Code: Select all

1,A
1,B
1,C
1,D
2,A
The result is a hash file that looks like this:

Code: Select all

1   1,A,B,C,D
2   2,A
Will this work even in PX jobs, instead of writing into Hash file we can write the data to a Dataset and remove duplicates to retain the last record?
Post Reply