DB2 Connector - enable partitioning
Moderators: chulett, rschirm, roy
DB2 Connector - enable partitioning
Reading from a partitioned db2 table. When we read the table using a 3 node config file and don't use enable partitioning...it works. But if we enable partitioning we get the same results 3 times one for each node in the config file, data duplicated. Anyone ever experienced this, what did you do to fix? How do we get the db2 connector stage to truly run in parallel? Thanks for your help.
in your sql statement (if user defined) do you have the
dbpartitionnum will work too
The stage is running parallel in the sense of ETL - one db2 connector pid for each node in the configuration. The sql statement gets handed off to an instane on the dpf server and then is handed off to the admin/coordinator node to execute in parallel in the DB. If you choose direct connect then true partitioned reads take place bypassing the admin/coordinator node.
I think what might be happening with your query is that you have defined the stage as enabled for partitioned reads, but the query is a straight select ... from the db without the above clause which is what will allow it to only bring data back for that particular node and partition.
Make sure you choose parition type of DB2 unless you want another kind, but that would be unlikely.
If you let the stage generate the sql then the stage will prepare a correct query with the above clause already present.
Code: Select all
NODENUMBER(partition_key)=CURRENT NODE
The stage is running parallel in the sense of ETL - one db2 connector pid for each node in the configuration. The sql statement gets handed off to an instane on the dpf server and then is handed off to the admin/coordinator node to execute in parallel in the DB. If you choose direct connect then true partitioned reads take place bypassing the admin/coordinator node.
I think what might be happening with your query is that you have defined the stage as enabled for partitioned reads, but the query is a straight select ... from the db without the above clause which is what will allow it to only bring data back for that particular node and partition.
Make sure you choose parition type of DB2 unless you want another kind, but that would be unlikely.
If you let the stage generate the sql then the stage will prepare a correct query with the above clause already present.
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
Marley777 wrote:thanks
where in the sql do we put this?
NODENUMBER(partition_key)=CURRENT NODE
Also want to mention we have a dpf environment, didn't know if that makes a difference.
Code: Select all
SELECT COL1, COL2 FROM TABLE WHERE NODENUMBER(COL1)=CURRENT_NODE
Keep in mind that COL1 is the column/key which would have been used in the DDL for the hash key for the table in question.
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com