Subrecord to Varchar

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I like the sud's footer note. Nice ....
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

Whew ... found my keyboard at last now I can type. DsGuru .... "15 years" ... ahem ... dont tell nyone the real number.

Initial input :

Key1,Key2,Key3,Col1,Col2
-------------------------------
1,1,2,23,name1
1,2,3,44,name2
1,1,2,34,name3
1,2,1,45,name6
1,2,3,33,name4

This goes to a Combine records stage where I defined the sub record column as SUB_REC and keys as Key1,Key2,Key3 to get the output:


Key1:1 Key2:1 Key3:2 SUB_REC:[0:(SUB_REC.Col1:23 SUB_REC.Col2:name1) 1:(SUB_REC.Col1:34 SUB_REC.Col2:name3)]

Key1:1 Key2:2 Key3:1 SUB_REC:[0:(SUB_REC.Col1:45 SUB_REC.Col2:name6)]

Key1:1 Key2:2 Key3:3 SUB_REC:[0:(SUB_REC.Col1:44 SUB_REC.Col2:name2) 1:(SUB_REC.Col1:33 SUB_REC.Col2:name4)]

Three records for three unique keys.
Next I use a split subrecord :

Key1:1 Key2:1 Key3:2 Col1:[0:23 1:34] Col2:[0:name1 1:name3]

Key1:1 Key2:2 Key3:1 Col1:[0:45] Col2:[0:name6]

Key1:1 Key2:2 Key3:3 Col1:[0:44 1:33] Col2:[0:name2 1:name4]

Now I want to pass the top level subrecords col1 and col2 to C routines for processsing, but am unable to pass subrecords. So I want to pass them as comma delimited values in a varchar, but I dont know how to convert the subrecord into varchar.

I can use just a transformer and using stage variables have the same effect though but it will be less efficient(tell me if that's not true) and get the col1 and col2 containing concatenated values.

Man do I feeel relieved after being able to say the whole thing !!
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Well before I burn my brain cells further, even if you use routines, you will have to call them from within the transformer. So the usage of a transformer is inevitable. To reduce complexity, you can use stage variables. Boy, feels like we arrived at a point where we originally started. :?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

Yes, but using a transformer does not slow things down for me, but the option of combining the records through the usage of stage variables as against the "Combine records stage" is what we need to see. Because I could not find any documentation regarding which one will be faster, so am assuming that the built in stage will be faster. Ofcourse if we cannot make things move, transformer is THE way to go. And it's too sad that datastage is not allowing me to pass the subrecord to a routine which tells me it is again some internal format and not just an array or vector type in C++.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I havent used combine stage so cannot comment on its throughput against transformer. Build the entire job using your stage variables and run it with your sample data. If time is acceptable stick to it.
In the mean time, maybe some one can throw light on how to get this working using combine stage or some other stage.
Thats why I wanted to know your intial data and your final data so that I could advise a different approach. You gave the intial data, but I still dont know what your final data will look like.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

ray.wurlod wrote:Promote Subrecord stage?
I used the promote subrecord and gave col1 as the input ... but I got the error : "main_program: Fatal Error: Not an aggregate field.". My guess is that the promoting of subrecord is already done since my subrecords col1 and col2 are at the top level.

Now, what I need to do is, convert subrecord Col1:[0:23 1:34] to a varchar with comma delimited values as : 23,34. So, I need to know how to convert a subrecord into varchar and whether it is possible.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
nvuradi
Participant
Posts: 54
Joined: Tue Dec 12, 2006 11:03 pm

Post by nvuradi »

sud wrote:
ray.wurlod wrote:Promote Subrecord stage?
I used the promote subrecord and gave col1 as the input ... but I got the error : "main_program: Fatal Error: Not an aggregate field.". My guess is that the promoting of subrecord is already done since my subrecords col1 and col2 are at the top level.

Now, what I need to do is, convert subrecord Col1:[0:23 1:34] to a varchar with comma delimited values as : 23,34. So, I need to know how to convert a subrecord into varchar and whether it is possible.
Hey Sud,
Did u find any solution for this i have a same requirement where in i have to pass all the subrecords for the further processing as normal columns..but when i looked into the data generated after Combine Subrecords stage i found it to be some junk value :( , after combining all the records i need to pass the data to next job for doing some more ETL processing.
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

Terribly sorry for not posting a solution. In fact that work got done a long time back, came across this old post . So anyways. Had to use a transformer and use stage variables and then for some further processing gave the concatenated list to a parallel routine and did all processing within it to do away with the requirement of toggling between subrecord and delimited list.

However, came across some interesting issues. For example, imagine you have a C routine which passes out a string, which is a pointer. So, you get a dynamic string back to datastage from the routine. Now, datastage does not have any mechanism of freeing those dynamic pointers once it is done with processing the dta for that record. This results in an ugly buildup of mem usage until your box goes boom. Contacted support and after heckling for a long time (since the moment they hear you are doing something custom, they freak out and blame saying its your fault) proved to them what was wrong and then they accepted that datastage does not free that memory. So what they suggest is using a build op stage instead where you can control mem usage and everything through code(since you can free up memory before you exit the stage or after processing each record). So I ended up using the build op finally. Though, the performance of the parallel routine was much better than the build op.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

sud wrote:Though, the performance of the parallel routine was much better than the build op.
Up until the 'Boom!' part, of course. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply