Subrecord to Varchar
Moderators: chulett, rschirm, roy
Whew ... found my keyboard at last now I can type. DsGuru .... "15 years" ... ahem ... dont tell nyone the real number.
Initial input :
Key1,Key2,Key3,Col1,Col2
-------------------------------
1,1,2,23,name1
1,2,3,44,name2
1,1,2,34,name3
1,2,1,45,name6
1,2,3,33,name4
This goes to a Combine records stage where I defined the sub record column as SUB_REC and keys as Key1,Key2,Key3 to get the output:
Key1:1 Key2:1 Key3:2 SUB_REC:[0:(SUB_REC.Col1:23 SUB_REC.Col2:name1) 1:(SUB_REC.Col1:34 SUB_REC.Col2:name3)]
Key1:1 Key2:2 Key3:1 SUB_REC:[0:(SUB_REC.Col1:45 SUB_REC.Col2:name6)]
Key1:1 Key2:2 Key3:3 SUB_REC:[0:(SUB_REC.Col1:44 SUB_REC.Col2:name2) 1:(SUB_REC.Col1:33 SUB_REC.Col2:name4)]
Three records for three unique keys.
Next I use a split subrecord :
Key1:1 Key2:1 Key3:2 Col1:[0:23 1:34] Col2:[0:name1 1:name3]
Key1:1 Key2:2 Key3:1 Col1:[0:45] Col2:[0:name6]
Key1:1 Key2:2 Key3:3 Col1:[0:44 1:33] Col2:[0:name2 1:name4]
Now I want to pass the top level subrecords col1 and col2 to C routines for processsing, but am unable to pass subrecords. So I want to pass them as comma delimited values in a varchar, but I dont know how to convert the subrecord into varchar.
I can use just a transformer and using stage variables have the same effect though but it will be less efficient(tell me if that's not true) and get the col1 and col2 containing concatenated values.
Man do I feeel relieved after being able to say the whole thing !!
Initial input :
Key1,Key2,Key3,Col1,Col2
-------------------------------
1,1,2,23,name1
1,2,3,44,name2
1,1,2,34,name3
1,2,1,45,name6
1,2,3,33,name4
This goes to a Combine records stage where I defined the sub record column as SUB_REC and keys as Key1,Key2,Key3 to get the output:
Key1:1 Key2:1 Key3:2 SUB_REC:[0:(SUB_REC.Col1:23 SUB_REC.Col2:name1) 1:(SUB_REC.Col1:34 SUB_REC.Col2:name3)]
Key1:1 Key2:2 Key3:1 SUB_REC:[0:(SUB_REC.Col1:45 SUB_REC.Col2:name6)]
Key1:1 Key2:2 Key3:3 SUB_REC:[0:(SUB_REC.Col1:44 SUB_REC.Col2:name2) 1:(SUB_REC.Col1:33 SUB_REC.Col2:name4)]
Three records for three unique keys.
Next I use a split subrecord :
Key1:1 Key2:1 Key3:2 Col1:[0:23 1:34] Col2:[0:name1 1:name3]
Key1:1 Key2:2 Key3:1 Col1:[0:45] Col2:[0:name6]
Key1:1 Key2:2 Key3:3 Col1:[0:44 1:33] Col2:[0:name2 1:name4]
Now I want to pass the top level subrecords col1 and col2 to C routines for processsing, but am unable to pass subrecords. So I want to pass them as comma delimited values in a varchar, but I dont know how to convert the subrecord into varchar.
I can use just a transformer and using stage variables have the same effect though but it will be less efficient(tell me if that's not true) and get the col1 and col2 containing concatenated values.
Man do I feeel relieved after being able to say the whole thing !!
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
Well before I burn my brain cells further, even if you use routines, you will have to call them from within the transformer. So the usage of a transformer is inevitable. To reduce complexity, you can use stage variables. Boy, feels like we arrived at a point where we originally started. ![Confused :?](./images/smilies/icon_confused.gif)
![Confused :?](./images/smilies/icon_confused.gif)
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Yes, but using a transformer does not slow things down for me, but the option of combining the records through the usage of stage variables as against the "Combine records stage" is what we need to see. Because I could not find any documentation regarding which one will be faster, so am assuming that the built in stage will be faster. Ofcourse if we cannot make things move, transformer is THE way to go. And it's too sad that datastage is not allowing me to pass the subrecord to a routine which tells me it is again some internal format and not just an array or vector type in C++.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
I havent used combine stage so cannot comment on its throughput against transformer. Build the entire job using your stage variables and run it with your sample data. If time is acceptable stick to it.
In the mean time, maybe some one can throw light on how to get this working using combine stage or some other stage.
Thats why I wanted to know your intial data and your final data so that I could advise a different approach. You gave the intial data, but I still dont know what your final data will look like.
In the mean time, maybe some one can throw light on how to get this working using combine stage or some other stage.
Thats why I wanted to know your intial data and your final data so that I could advise a different approach. You gave the intial data, but I still dont know what your final data will look like.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
I used the promote subrecord and gave col1 as the input ... but I got the error : "main_program: Fatal Error: Not an aggregate field.". My guess is that the promoting of subrecord is already done since my subrecords col1 and col2 are at the top level.ray.wurlod wrote:Promote Subrecord stage?
Now, what I need to do is, convert subrecord Col1:[0:23 1:34] to a varchar with comma delimited values as : 23,34. So, I need to know how to convert a subrecord into varchar and whether it is possible.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
Hey Sud,sud wrote:I used the promote subrecord and gave col1 as the input ... but I got the error : "main_program: Fatal Error: Not an aggregate field.". My guess is that the promoting of subrecord is already done since my subrecords col1 and col2 are at the top level.ray.wurlod wrote:Promote Subrecord stage?
Now, what I need to do is, convert subrecord Col1:[0:23 1:34] to a varchar with comma delimited values as : 23,34. So, I need to know how to convert a subrecord into varchar and whether it is possible.
Did u find any solution for this i have a same requirement where in i have to pass all the subrecords for the further processing as normal columns..but when i looked into the data generated after Combine Subrecords stage i found it to be some junk value
![Sad :(](./images/smilies/icon_sad.gif)
Terribly sorry for not posting a solution. In fact that work got done a long time back, came across this old post . So anyways. Had to use a transformer and use stage variables and then for some further processing gave the concatenated list to a parallel routine and did all processing within it to do away with the requirement of toggling between subrecord and delimited list.
However, came across some interesting issues. For example, imagine you have a C routine which passes out a string, which is a pointer. So, you get a dynamic string back to datastage from the routine. Now, datastage does not have any mechanism of freeing those dynamic pointers once it is done with processing the dta for that record. This results in an ugly buildup of mem usage until your box goes boom. Contacted support and after heckling for a long time (since the moment they hear you are doing something custom, they freak out and blame saying its your fault) proved to them what was wrong and then they accepted that datastage does not free that memory. So what they suggest is using a build op stage instead where you can control mem usage and everything through code(since you can free up memory before you exit the stage or after processing each record). So I ended up using the build op finally. Though, the performance of the parallel routine was much better than the build op.
However, came across some interesting issues. For example, imagine you have a C routine which passes out a string, which is a pointer. So, you get a dynamic string back to datastage from the routine. Now, datastage does not have any mechanism of freeing those dynamic pointers once it is done with processing the dta for that record. This results in an ugly buildup of mem usage until your box goes boom. Contacted support and after heckling for a long time (since the moment they hear you are doing something custom, they freak out and blame saying its your fault) proved to them what was wrong and then they accepted that datastage does not free that memory. So what they suggest is using a build op stage instead where you can control mem usage and everything through code(since you can free up memory before you exit the stage or after processing each record). So I ended up using the build op finally. Though, the performance of the parallel routine was much better than the build op.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.