Need best partitioning method for hierarchy mgmt

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
reachthiru
Participant
Posts: 28
Joined: Mon Jan 09, 2006 1:31 pm

Need best partitioning method for hierarchy mgmt

Post by reachthiru »

Hi,

I have a data file with data like this

1,a
1-10,t
1-11,u
1-12,v
1-10-20,x
1-10-25,y
1-11-26,z

I need to convert this data as

1,a,1-10,t,1-10-20,x
1,a,1-10,t,1-10-25,y
1,a,1-11,u,1-11-26,z
1,a,1-12,v,,,

For this, I created a job like Sequential -> Sort -> Transformer -> Sequential.

In the Transformer stage, I used stage variables and storing the incoming data in different variables based on the # of occurances of hyphen(-) and writing only the final level data to output file with the stage variables.

My logic will work fine in a server job. But since it is a parallel job, I am not getting the desired output. If I change the partition method to 'Entire', then I am getting the proper output, but the results are duplicated due to the more # of nodes.

The other way we are thinking is using the data file as lookup as well and forming the hierarchy. It will work fine, but little complex.

Is there any way to get the result using the first method without changing the # of nodes?

Thanks in advance.
With regards,
Thiru
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Your input and your output are both sequential files so try making your parallel job remain in sequential mode and it wont run that much slower. Easiest way is to create a config file with just one node in it, add the $APT_CONFIG_FILE environment variable to your job and set it to the 1node config file. You should only get one instance of your sort and transformer stage.

Will run faster then your Entire option as instead of moving the data to multiple nodes it only processes it to one node.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,

As vincent suggested, use environmental variable.
Else, since the input is sequential file, maintail the same partion for the sort and transformer stage which will inturn act in sequential fashion.
Can you explain us what is the logic you handle in transformer, so that there can be a way where we can find some optimal partion for that scenario.

-Kumar
reachthiru
Participant
Posts: 28
Joined: Mon Jan 09, 2006 1:31 pm

Post by reachthiru »

Hi Vincent & Kumar,

Thanks for your suggestions. Actually I already did what is told by Vincent. But the sequence that I have mentioned is for testing purpose only, actually I need to write all my final data to a table and I may have to read a million records.

OK, this is my logic.

First I am sorting my data and so it will become like this:

1,a
1-10,t
1-10-20,x
1-10-25,y
1-11,u
1-11-26,z
1-12,v

Then in TRANSFOMER stage, I have defined stage variables for each level like lvl1,lvl1desc,lvl2,etc. When I am reading the data, I will check the # of hyphens(-) and if it 0, it will go to lvl1. if it is 1, it will go to lvl2 and like that. I am writing all my stage variables to output. I also added a constraint as count of hyphens should be 2 (ie my final level). So according to my data that output will be

1,a,,,,,
1,a,1-10,t,,,
1,a,1-10,t,1-10-20,x (Constraint True)
1,a,1-10,t,1-10-25,y (Constraint True)
1,a,1-11,u,,,
1,a,1-11,u,1-11-26,z (Constraint True)
1,a,1-12,v,,

Hope I explained my logic well. As I mentioned earlier, I am getting the output of what I am getting, only thing is I could not establish paralleism and looking for a best solution from gurus.

Thanks.
With regards,
Thiru
gpatton
Premium Member
Premium Member
Posts: 47
Joined: Mon Jan 05, 2004 8:21 am

Post by gpatton »

How many "root" levels will you have in your hierarchy ( in your example 1 )?

You could partition your data based upon values of the "root" level and then run subsets in parallel.
reachthiru
Participant
Posts: 28
Joined: Mon Jan 09, 2006 1:31 pm

Post by reachthiru »

Hi gpatton,

I have only one root node.
With regards,
Thiru
Post Reply