Hi,
I have a data file with data like this
1,a
1-10,t
1-11,u
1-12,v
1-10-20,x
1-10-25,y
1-11-26,z
I need to convert this data as
1,a,1-10,t,1-10-20,x
1,a,1-10,t,1-10-25,y
1,a,1-11,u,1-11-26,z
1,a,1-12,v,,,
For this, I created a job like Sequential -> Sort -> Transformer -> Sequential.
In the Transformer stage, I used stage variables and storing the incoming data in different variables based on the # of occurances of hyphen(-) and writing only the final level data to output file with the stage variables.
My logic will work fine in a server job. But since it is a parallel job, I am not getting the desired output. If I change the partition method to 'Entire', then I am getting the proper output, but the results are duplicated due to the more # of nodes.
The other way we are thinking is using the data file as lookup as well and forming the hierarchy. It will work fine, but little complex.
Is there any way to get the result using the first method without changing the # of nodes?
Thanks in advance.
Need best partitioning method for hierarchy mgmt
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 28
- Joined: Mon Jan 09, 2006 1:31 pm
Need best partitioning method for hierarchy mgmt
With regards,
Thiru
Thiru
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Your input and your output are both sequential files so try making your parallel job remain in sequential mode and it wont run that much slower. Easiest way is to create a config file with just one node in it, add the $APT_CONFIG_FILE environment variable to your job and set it to the 1node config file. You should only get one instance of your sort and transformer stage.
Will run faster then your Entire option as instead of moving the data to multiple nodes it only processes it to one node.
Will run faster then your Entire option as instead of moving the data to multiple nodes it only processes it to one node.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Hi,
As vincent suggested, use environmental variable.
Else, since the input is sequential file, maintail the same partion for the sort and transformer stage which will inturn act in sequential fashion.
Can you explain us what is the logic you handle in transformer, so that there can be a way where we can find some optimal partion for that scenario.
-Kumar
As vincent suggested, use environmental variable.
Else, since the input is sequential file, maintail the same partion for the sort and transformer stage which will inturn act in sequential fashion.
Can you explain us what is the logic you handle in transformer, so that there can be a way where we can find some optimal partion for that scenario.
-Kumar
-
- Participant
- Posts: 28
- Joined: Mon Jan 09, 2006 1:31 pm
Hi Vincent & Kumar,
Thanks for your suggestions. Actually I already did what is told by Vincent. But the sequence that I have mentioned is for testing purpose only, actually I need to write all my final data to a table and I may have to read a million records.
OK, this is my logic.
First I am sorting my data and so it will become like this:
1,a
1-10,t
1-10-20,x
1-10-25,y
1-11,u
1-11-26,z
1-12,v
Then in TRANSFOMER stage, I have defined stage variables for each level like lvl1,lvl1desc,lvl2,etc. When I am reading the data, I will check the # of hyphens(-) and if it 0, it will go to lvl1. if it is 1, it will go to lvl2 and like that. I am writing all my stage variables to output. I also added a constraint as count of hyphens should be 2 (ie my final level). So according to my data that output will be
1,a,,,,,
1,a,1-10,t,,,
1,a,1-10,t,1-10-20,x (Constraint True)
1,a,1-10,t,1-10-25,y (Constraint True)
1,a,1-11,u,,,
1,a,1-11,u,1-11-26,z (Constraint True)
1,a,1-12,v,,
Hope I explained my logic well. As I mentioned earlier, I am getting the output of what I am getting, only thing is I could not establish paralleism and looking for a best solution from gurus.
Thanks.
Thanks for your suggestions. Actually I already did what is told by Vincent. But the sequence that I have mentioned is for testing purpose only, actually I need to write all my final data to a table and I may have to read a million records.
OK, this is my logic.
First I am sorting my data and so it will become like this:
1,a
1-10,t
1-10-20,x
1-10-25,y
1-11,u
1-11-26,z
1-12,v
Then in TRANSFOMER stage, I have defined stage variables for each level like lvl1,lvl1desc,lvl2,etc. When I am reading the data, I will check the # of hyphens(-) and if it 0, it will go to lvl1. if it is 1, it will go to lvl2 and like that. I am writing all my stage variables to output. I also added a constraint as count of hyphens should be 2 (ie my final level). So according to my data that output will be
1,a,,,,,
1,a,1-10,t,,,
1,a,1-10,t,1-10-20,x (Constraint True)
1,a,1-10,t,1-10-25,y (Constraint True)
1,a,1-11,u,,,
1,a,1-11,u,1-11-26,z (Constraint True)
1,a,1-12,v,,
Hope I explained my logic well. As I mentioned earlier, I am getting the output of what I am getting, only thing is I could not establish paralleism and looking for a best solution from gurus.
Thanks.
With regards,
Thiru
Thiru
-
- Participant
- Posts: 28
- Joined: Mon Jan 09, 2006 1:31 pm