PART and PARTCOUNT

Vasanth · Post by **Vasanth** » Tue Nov 27, 2007 3:04 am

Can anyone explain how PART and PARTCOUNT in Row Generator stage actually working at node level (4, 8 etc.)?

Also, please do explain What is PART and PARTCOUNT?

Thanks in advance,
Vasanth

devidotcom · Post by **devidotcom** » Tue Nov 27, 2007 4:29 am

I gues PART would be in which partition the data is in and PARTCOUNT is the number of partition on which the job is running..

ray.wurlod · Post by **ray.wurlod** » Tue Nov 27, 2007 5:46 am

By default the Row Generator stage operates in sequential mode. You would have to set it to execute in parallel mode for these values to make any sense. PART is the number of the partition on which a particular process is executing (starting from zero), while PARTCOUNT is the number of partitions presently being used - governed by the choice of configuration file through APT_CONFIG_FILE environment variable.

Therefore, if you had four nodes, PART would be 0, 1, 2 or 3 and PARTCOUNT would be 4.

If you had eight nodes, PART would be 0, 1, 2, 3, 4, 5, 6 or 7 and PARTCOUNT would be 8.

Vasanth · Post by **Vasanth** » Wed Nov 28, 2007 12:17 am

Rey,

May i know how node works?

Are nodes being used only to carry data or it performs any actual calculation (say addition for example)?

Suppose, i have a sequence row generator stage with initial value = 0 and increment = 15 and run the job in 2-node config. How the first node takes value 0 and second node generates 15. Where the actual increment processing happens?

If the node being used to carry data then how parallelism acheived here?

Thanks in advance,
Vasanth

ray.wurlod · Post by **ray.wurlod** » Wed Nov 28, 2007 3:51 am

If you hard code the initial value as 0 and the increment as 15 then each of your processing nodes (mentioned in the configuration files) will have the same sequence generated, namely 0,15,30,45,...

This is probably not what you want. If you set the initial value to PART and the increment to PARTCOUNT and you have four nodes, then you will get the following sequences generated in parallel execution mode:
node #0: 0,4,8,12,...
node #1: 1,5,9,13,...
node #2: 2,6,10,14,...
node #3: 3,7,11,15,...