Data Manipulation
Moderators: chulett, rschirm, roy
Data Manipulation
Hello Gurus ,
My design includes a seq file stage - an aggregator - seq_file_output file..
My input looks like
ActNo , posno , phone_1, phone_2 , address_1 , address_2
1 , 1 , 1234 , , 12 abc ,
1 , 2 , , 4567 , , 23 xyz
11 , 1 , 6789 , , 15 mef ,
Output should be (idea is to merge based on key column(here actno ) - 1 key column should have one entry)
ActNo , phone_1, phone_2 , address_1 , address_2
1 , 1234 , 4567 ,12 abc , 23 xyz
11 , 6789 , , 15 mef ,
I tried using simple aggregation and using the max condition by grouping by actno
However that works well with the getting the phone number but since address has both numbers and alphabets , the output is 12 instead of 12 abc for address_1 and 23 instead of 23 xyz . Is there a way to fix this ? Thank a lot for reading my post
My design includes a seq file stage - an aggregator - seq_file_output file..
My input looks like
ActNo , posno , phone_1, phone_2 , address_1 , address_2
1 , 1 , 1234 , , 12 abc ,
1 , 2 , , 4567 , , 23 xyz
11 , 1 , 6789 , , 15 mef ,
Output should be (idea is to merge based on key column(here actno ) - 1 key column should have one entry)
ActNo , phone_1, phone_2 , address_1 , address_2
1 , 1234 , 4567 ,12 abc , 23 xyz
11 , 6789 , , 15 mef ,
I tried using simple aggregation and using the max condition by grouping by actno
However that works well with the getting the phone number but since address has both numbers and alphabets , the output is 12 instead of 12 abc for address_1 and 23 instead of 23 xyz . Is there a way to fix this ? Thank a lot for reading my post
Thank you craig - I think I was referring to something like seq() function that converts the first char into (i believe) Ascii , I am aware of the datatype conversion function you mentioned above, but as you mentioned since the aggregater is only giving out "23" between "23 xyz" and "empty string" for max value , I was thinking if I can convert the whole "23 xyz" into numeric and back to "23 xyz". If I cannot do that is there a way to get my desired output from a different method?
Is a string sort acceptable for that data? If so, you should be able to leave the field as a string and override the default Aggregator behavior that converts everything to double in the stage. There should be a 'Preserve Type' option in there somewhere that will accomplish that.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Not sure I understand what you mean by "string sort" in this context, here we are comparing empty string with data (which is a combination of numbers and alphabets) . In this scenario its always "empty string" vs some data (address data) .
There is a preserve partitioning drop down which has two options clear and set in the advanced tab . I don't think i can see a preserve type option .
If you can let me know where it is , I can test the job with this option.
There is a preserve partitioning drop down which has two options clear and set in the advanced tab . I don't think i can see a preserve type option .
If you can let me know where it is , I can test the job with this option.
Ok , I get it now , thanks for the clarification , String sort is allowed . However the maximum value as I mentioned above needs to be calculated only based on a non empty string (address data with numbers and alphabets like "240 abc" vs "empty" fields.
I am still trying to find the solution for the required output
I am still trying to find the solution for the required output
I did a quick search and found one discussion here which should help. There were others.devsonali wrote:I don't think i can see a preserve type option. If you can let me know where it is, I can test the job with this option.
And I know it's already been answered but just in case, a 'string sort' means numbers would sort like this:
1
11
12
2
21
22
3
Etc.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers