Page 1 of 1

Question on correlation between partitions and nodes

Posted: Mon Jan 21, 2013 12:45 pm
by kaps
I am trying to understand the correlation between Partitions and Nodes.

Job design is two sequentail files feeding to Funnel and Remove Duplicate stage followed by transformer and another sequential file.

I hash partitioned on a key in a remove duplicate stage and I run with 8 node config file and the job ran in 8 partitions(from monitor). when I ran it through a debugger I noticed that two records with the same key value went into two different nodes (node1 and node4) but it deduped correctly.

1. Does this mean that those two nodes were in the same partition ? If so, How do we know what node goes into what partition ? My understanding is that same key value need to go to same partition during hash parition.
2. How is it going to distribute the records If I had 10 records with the same key value ?

What's strange is that If I just run 1 record(same key) in both the input files instead of all the records then both of them goes to a same node.

I am confused. Please let me know if I am not making any sense...

Posted: Mon Jan 21, 2013 1:18 pm
by ArndW
In your case the sequential files are being read sequentially, then most likely a round-robin partitioning is being done going into the funnel. Where are you doing your explicit partitioning?

In this context the nodes and partitions are the same thing.

Posted: Mon Jan 21, 2013 4:12 pm
by ray.wurlod
A node is a logical subdivision of processing resources.

A partition is that subset of the data that are processed on a node.

Posted: Tue Jan 22, 2013 1:36 pm
by kaps
ArndW - Explicit partitioning is done in Remove Duplicate stage. If nodes and partitions are same then I would not get the expected result as both records goes to different partition. Correct me If I am wrong here.

Ray - If partition is subset of data processed in a node then am not sure why both the records went to two different nodes here.

Thanks for your time...Any input is appreciated.

Posted: Tue Jan 22, 2013 2:32 pm
by ray.wurlod
You need to look at the job score and determine which hashing algorithm was actually used.

Posted: Wed Jan 23, 2013 1:00 pm
by kaps
By looking at the job score I believe hash partition happened(please see bolded text) when it went to remove dedup stage which should have sent the records to same partition.

Following is the score:

Code: Select all

main_program: This step has 9 datasets:
ds0: {op0[1p] (sequential sf_Case360_PolStubFields)
      eAny<>eCollectAny
      op2[8p] (parallel buffer(0))}
ds1: {op1[1p] (sequential sf_ISS_PolStubFields)
      eAny<>eCollectAny
      op3[8p] (parallel buffer(1))}
ds2: {op2[8p] (parallel buffer(0))
      eSame=>eCollectAny
      op4[8p] (parallel fnl_merge)}
ds3: {op3[8p] (parallel buffer(1))
      eSame=>eCollectAny
      op4[8p] (parallel fnl_merge)}
[b]ds4: {op4[8p] (parallel fnl_merge)
      eOther(APT_HashPartitioner { key={ value=POL_NB, 
        subArgs={ cs }
      }
})#>eCollectAny
      op5[8p] (parallel RmDp_dedup.lnk_to_agg_Sort)}
ds5: {op5[8p] (parallel RmDp_dedup.lnk_to_agg_Sort)
      [pp] eSame=>eCollectAny
      op6[8p] (parallel RmDp_dedup)}[/b]ds6: {op6[8p] (parallel RmDp_dedup)
      eAny=>eCollectAny
      op7[8p] (parallel APT_TransformOperatorImplV0S79_BigE_common_2035_Pol_Stub_Fields_ins_initial_load_debug_xfm_pass in xfm_pass)}
ds7: {op7[8p] (parallel APT_TransformOperatorImplV0S79_BigE_common_2035_Pol_Stub_Fields_ins_initial_load_debug_xfm_pass in xfm_pass)
      [pp] eSame=>eCollectAny
      op8[8p] (parallel buffer(2))}
ds8: {op8[8p] (parallel buffer(2))
      [pp] >>eCollectOther(APT_SortedMergeCollector { key={ value=POL_NB, 
        subArgs={ cs, asc }
      }
})

Posted: Wed Jan 23, 2013 1:26 pm
by ray.wurlod
So you have proven that the data are being hash partitioned on POL_NB. Are you claiming that there is one value of POL_NB that is found on more than one node? If so, please figure out a way to demonstrate that and get in touch with your official support provider.