multiple rows to single row

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

multiple rows to single row

Post by pandeesh »

Hi,

This topic is already discussed so many times . but the scenario which i expalin is little bit different.

Basically i want to achieve the functionality of WM_CONCAT () oracle function in datastage.

My i/p file contains 2 columns as given below:

id,value
------------
1|10
1|20
1|30
2|10
2|20


The output what i am expecting is:

id,values
-----------
1|10,20,30
2|10,20


I am not able to find any user defined column function aggregator stage.
All those functions are built in including sum,standard deviaton.

Please help me in this
pandeeswaran
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

i have found some posts exactly similar to that.
But i am not able to find Vertical Pivot stage in my DS 7.5 designer.
is there any other way to achieve that?

Thanks
pandeeswaran
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

Post by samyamkrishna »

Hi ,

This can be achived in transformer stage using stage variable.

This has been discussed a lot of times

viewtopic.php?t=140653&highlight=

Hope this helps
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: multiple rows to single row

Post by chulett »

pandeesh wrote:This topic is already discussed so many times . but the scenario which i explain is little bit different.
Sorry, don't see anything different about this, just a plain old ordinary vertical pivot. The fact that you want everything in one column instead of multiple columns is just a variation on that theme, one that also has been discussed before.

Would you entertain using a Server job? That would make an interesting exercise for you and be... trivial to roll up. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Re: multiple rows to single row

Post by pandeesh »

chulett wrote:Would you entertain using a Server job? That would make an interesting exercise for you and be... trivial to roll up. :wink:
yes .

i remember we need to create a hash file from the source file and we need to use the hash file for lookup.

we will use the id as key field. but in the lookup it will find many matches for the key..how we need to handle this and get the required output?

Thanks
pandeeswaran
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

On a lookup "miss" you put the value in the second field. For any subsequent lookup "hit" you append a comma and the new value to the existing value.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

chulett wrote:On a lookup "miss" you put the value in the second field. For any subsequent lookup "hit" you append a comma and the new value to the existing value.
craig,

i am struggling to create a hash file.

in the sequential file stage i mentioned id,value as two integer columns.

My data is as given below:

1,10
1,20
1,30

In the hashed file stage i mentioned id as key.
So it doesn't allow duplicates and 1,30 only loaded to hash file.

we cant use a hashed file stage without key.

How to make it work?

Thanks
pandeeswaran
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, you (in essence) insert the first record and update all subsequent records. And you do the 'update' by looking up the current value and appending/concatenating any new values to it before doing the destructive overwrite back to the hashed file.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

chulett wrote:As noted, you (in essence) insert the first record and update all subsequent records. And you do the 'update' by looking up the current value and appending/concatenating any new values to it before doing the destructive overwrite back to the hashed file.
Sorry Craig!!i am not able to understand! How to insert the first record and update the remaining in hash file?

Are you telling something like the below?>

Code: Select all



seq.file---------->Transformer----------->Seq.file
(id,value)                 |
                               |
                               |
seq.file ------->hash file(id as key)
(id,value)

pandeeswaran
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Moved this over to the Server forum.

No, do not 'preload' the hashed file. You should have one input from the source and you should be doing a reference lookup and writing back to the same hashed file in a single transformer.

You'll need to precreate the hashed file empty for this to work, use a transformer as the source for this and mark the hashed file to be cleared.

Code: Select all

seq.file---------->Transformer----------->Hashed write ----> seq.file
(id,value)                | 
                          | 
                          | 
tx ------------->hashed file(read) 
Make sure you do not cache the lookup. Once the hashed file has been built, the last link selects from it to build the actual output file.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

Thanks Craig.

i have tried to create a empty hashed file in a separate job.

Code: Select all


transformer--------->Hashed file

But it throwed no primary link error.

As per the job design you mentioned,

The primary source seq file will contain 3 records and the reference hashed file contains no records.

So What condition we need to give in transformer?

is it ref_link.NOTFOUND?

i need the values of value column as comma separated. so i need to make that column as varchar.

please help me.

Thanks
pandeeswaran
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

To source from a transformer, you need to do two things:

1) Define a stage variable. Note that I did not say "use" although you certainly could, it just needs to exist to allow the job to compile.

2) Set your output link constraint to let out only the number of records that you want, without one it will output rows... well, forever. So something like @OUTROWNUM=1 would work for one record but here you don't actually need (or want) a record to flow down the link. You just need the "clear" to fire when the stage initializes so a constraint of @FALSE would be appropriate here. And there's really no reason to separate that out into its own job.

Yes, you could use the NOTFOUND link status variable to check the lookup, ref_link.NOTFOUND would be true when the lookup 'fails' and Not(ref_link.NOTFOUND) would be true when the the lookup succeeds. I would typically set up both conditions as well-named boolean stage variables and then reference them in any dependant derivations.
-craig

"You can never have too many knives" -- Logan Nine Fingers
maks475
Participant
Posts: 3
Joined: Wed Oct 28, 2009 12:59 am

Re: multiple rows to single row

Post by maks475 »

hi, guess you've only two fields, col1 and col2. then do the following..

source->sort>transformer->R.Duplicate->target.
(in transforer take stage variables and define as below)

stg_var1=col1
str_var2=col2
stg_var3=stg_var4 (this is final string to populate in target)
stg_var4= if stg_var1=stg_var5 then stgvar3:str_var2 else str_var2
stg_var5=stg_var2
Note: define all stage variables and their default values.

In Remove Duplicate Stage retain the last record.

correct me if I am worng.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK... you are wrong. And welcome. :wink:

Keep in mind the Job Type being discussed when replying. This is the Server forum so no "Remove Duplicates" stage and hence the recommendations to use a Transformer.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In Server jobs the Aggregator stage has a "keep first/last" capability.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply