Page 1 of 2

Concatenation of Rows

Posted: Tue Jan 10, 2006 4:13 am
by sengs
Hi,
I have a requirement of concatenation of values at different rows of a single column.How this can be done in DS.
For Example

NO ........... TYPE
1 ....................A
1.................... B
1.................... C
1.................... D

At the end my requirement is,
1.........................A,B,C,D

Please advice.
Thanks,
Sengs

Posted: Tue Jan 10, 2006 4:36 am
by djm
Hi sengs,

this is often referred to in these forums using the term "vertical pivot". Search the forum and see what you can find. Another search option to try is "man paste" (Craig - don't go there!).

David.

Posted: Tue Jan 10, 2006 6:12 am
by jzparad
Requirement:
Vertical pivot based on the value of a pivot field (in this case NUM)
Data file has two columns - NUM, TYPE (although the solution can be tailored for more)


Assumptions:
Number of records with the same value in pivot field is unknown
Pivot field cannot be an empty string


Solution:

Code: Select all

      L1             L2
SEQ---------->TFM---------->HSH



Define sequential file with two columns
NUM varchar 10
TYPE varchar 10


Define Transform as follows

Stage Variables
currentKey
   Initial value = ""
   Derivation = L1.NUM
newRecord
   Initial value = ""
   Derivation = if currentKey <> lastKey Then L1.NUM:",":L1.TYPE else newRecord:",":L1.TYPE
lastKey
   Initial value = ""
   Derivation = currentKey

L2 Deriviations
L2.key = L1.NUM
L2.line = newRecord


Define hash file with two columns
key varchar 10 (marked as the key)
line varchar 200

Description
As each record is read, the value of the pivot field is compared to the previous value.
If the values are different, the newRecord variable is initialised with the NUM and type field in CSV format
If the values are the same, the type field is appended to the newRecord variable in CSV format
Each record is written to the hash file and because the pivot field is used as the key, every record with the same value in the pivot field will overwrite the previous record.

For a file with the following values:

Code: Select all

1,A
1,B
1,C
1,D
2,A
The result is a hash file that looks like this:

Code: Select all

1   1,A,B,C,D
2   2,A

Posted: Tue Jan 10, 2006 7:40 am
by chulett
D wrote:(Craig - don't go there!)
Doh! Cut off at the knees. Ok, not going there. :lol:

Posted: Tue Jan 10, 2006 5:45 pm
by vmcburney
Jim, you should post that to the FAQ forum as the solution for a vertical pivot requirement.

Posted: Tue Jan 10, 2006 7:06 pm
by jzparad
Vince,

I'm afraid I'm not privileged enough to do that.

Code: Select all

Sorry, but only users granted special access can post topics in this forum.
Maybe you could do it for me!

Posted: Tue Jan 10, 2006 7:09 pm
by jzparad
By the way, I forgot to list one important assumption.

The data has to be sorted by the pivot field.

Posted: Tue Jan 10, 2006 7:36 pm
by vmcburney
Roy! Could you wave your magic wand at Jim and make him a FAQ Genie? His vertical pivot answer beats the FAQ out of the previous responses for completeness.

Posted: Tue Jan 10, 2006 7:46 pm
by ray.wurlod
If you use field mark (@FM) rather than comma, your data will magically be in multiple columns in the hashed file even though you only load the key and field #1. You can then read from the hashed file with a full set of column definitions.

Posted: Tue Jan 10, 2006 11:12 pm
by jzparad
Ray,
If you use field mark (@FM) rather than comma, your data will magically be in multiple columns in the hashed file even though you only load the key and field #1.
I assume that to do this, you would have to construct a hash file with the correct number of columns first. I tried to simply change the commas to @FM and I got the following result for the same input

Code: Select all

1,1
2,2

Posted: Tue Jan 10, 2006 11:49 pm
by ray.wurlod
Aha, that's what you got with View Data.

Add more column definitions - even to the metadata in the Columns grid, and look again!

That is, the Hashed File stage has the ability to populate undefined columns, in exactly the same way that the Repository database tables contain undefined columns that are, nonetheless, accessible. But not by name - columns in hashed files are accessed by ordinal storage position in the record.

Posted: Wed Jan 11, 2006 3:15 am
by jzparad
Absolutely right!

I added new columns to the hash file stage and without re-running the job, I could view the data.

Re: Concatenation of Rows

Posted: Sun Nov 26, 2006 1:20 am
by sheshadri
sengs wrote:Hi,
Hi fr's thanx to send very nice information.
i got answer to my typical problem.

NO ........... TYPE
1 ....................A
1.................... B
1.................... C
1.................... D

At the end my requirement is,
1.........................A,B,C,D

Please advice.
Thanks,
Sengs

Posted: Sun Nov 26, 2006 8:15 am
by chulett
Is this a question, reply, some kind of graffiti or... oh, I see - a spirit guide for the OP, returned to let us know all is well almost 11 months later. :wink:

Someone needs to learn how to quote more better. That or just use the Reply to Topic button. 8)

Posted: Tue Nov 06, 2007 5:51 am
by nvuradi
jzparad wrote:Requirement:
Vertical pivot based on the value of a pivot field (in this case NUM)
Data file has two columns - NUM, TYPE (although the solution can be tailored for more)


Assumptions:
Number of records with the same value in pivot field is unknown
Pivot field cannot be an empty string


Solution:

Code: Select all

      L1             L2
SEQ---------->TFM---------->HSH



Define sequential file with two columns
NUM varchar 10
TYPE varchar 10


Define Transform as follows

Stage Variables
currentKey
   Initial value = ""
   Derivation = L1.NUM
newRecord
   Initial value = ""
   Derivation = if currentKey <> lastKey Then L1.NUM:",":L1.TYPE else newRecord:",":L1.TYPE
lastKey
   Initial value = ""
   Derivation = currentKey

L2 Deriviations
L2.key = L1.NUM
L2.line = newRecord


Define hash file with two columns
key varchar 10 (marked as the key)
line varchar 200

Description
As each record is read, the value of the pivot field is compared to the previous value.
If the values are different, the newRecord variable is initialised with the NUM and type field in CSV format
If the values are the same, the type field is appended to the newRecord variable in CSV format
Each record is written to the hash file and because the pivot field is used as the key, every record with the same value in the pivot field will overwrite the previous record.

For a file with the following values:

Code: Select all

1,A
1,B
1,C
1,D
2,A
The result is a hash file that looks like this:

Code: Select all

1   1,A,B,C,D
2   2,A
Will this work even in PX jobs, instead of writing into Hash file we can write the data to a Dataset and remove duplicates to retain the last record?