DB2 Connector - enable partitioning

Marley777 · Post by **Marley777** » Tue Oct 26, 2010 3:15 pm

Reading from a partitioned db2 table. When we read the table using a 3 node config file and don't use enable partitioning...it works. But if we enable partitioning we get the same results 3 times one for each node in the config file, data duplicated. Anyone ever experienced this, what did you do to fix? How do we get the db2 connector stage to truly run in parallel? Thanks for your help.

mhester · Post by **mhester** » Tue Oct 26, 2010 3:35 pm

in your sql statement (if user defined) do you have the

Code: Select all

NODENUMBER(partition_key)=CURRENT NODE

dbpartitionnum will work too

The stage is running parallel in the sense of ETL - one db2 connector pid for each node in the configuration. The sql statement gets handed off to an instane on the dpf server and then is handed off to the admin/coordinator node to execute in parallel in the DB. If you choose direct connect then true partitioned reads take place bypassing the admin/coordinator node.

I think what might be happening with your query is that you have defined the stage as enabled for partitioned reads, but the query is a straight select ... from the db without the above clause which is what will allow it to only bring data back for that particular node and partition.

Make sure you choose parition type of DB2 unless you want another kind, but that would be unlikely.

If you let the stage generate the sql then the stage will prepare a correct query with the above clause already present.

Marley777 · Post by **Marley777** » Wed Oct 27, 2010 7:01 am

thanks

where in the sql do we put this?

NODENUMBER(partition_key)=CURRENT NODE

Also want to mention we have a dpf environment, didn't know if that makes a difference.

mhester · Post by **mhester** » Wed Oct 27, 2010 8:44 am

Marley777 wrote:thanks

where in the sql do we put this?

NODENUMBER(partition_key)=CURRENT NODE

Also want to mention we have a dpf environment, didn't know if that makes a difference.

Code: Select all

SELECT COL1, COL2 FROM TABLE WHERE NODENUMBER(COL1)=CURRENT_NODE

Choose Uncommitted Read in the stage and yes this is syntax specifically related to partitioned tables.

Keep in mind that COL1 is the column/key which would have been used in the DDL for the hash key for the table in question.