DB2 Connector - enable partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Marley777
Premium Member
Premium Member
Posts: 316
Joined: Tue Jan 27, 2004 3:26 pm

DB2 Connector - enable partitioning

Post by Marley777 »

Reading from a partitioned db2 table. When we read the table using a 3 node config file and don't use enable partitioning...it works. But if we enable partitioning we get the same results 3 times one for each node in the config file, data duplicated. Anyone ever experienced this, what did you do to fix? How do we get the db2 connector stage to truly run in parallel? Thanks for your help.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

in your sql statement (if user defined) do you have the

Code: Select all

NODENUMBER(partition_key)=CURRENT NODE
dbpartitionnum will work too

The stage is running parallel in the sense of ETL - one db2 connector pid for each node in the configuration. The sql statement gets handed off to an instane on the dpf server and then is handed off to the admin/coordinator node to execute in parallel in the DB. If you choose direct connect then true partitioned reads take place bypassing the admin/coordinator node.

I think what might be happening with your query is that you have defined the stage as enabled for partitioned reads, but the query is a straight select ... from the db without the above clause which is what will allow it to only bring data back for that particular node and partition.

Make sure you choose parition type of DB2 unless you want another kind, but that would be unlikely.

If you let the stage generate the sql then the stage will prepare a correct query with the above clause already present.
Marley777
Premium Member
Premium Member
Posts: 316
Joined: Tue Jan 27, 2004 3:26 pm

Post by Marley777 »

thanks

where in the sql do we put this?

NODENUMBER(partition_key)=CURRENT NODE

Also want to mention we have a dpf environment, didn't know if that makes a difference.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Marley777 wrote:thanks

where in the sql do we put this?

NODENUMBER(partition_key)=CURRENT NODE

Also want to mention we have a dpf environment, didn't know if that makes a difference.

Code: Select all

SELECT COL1, COL2 FROM TABLE WHERE NODENUMBER(COL1)=CURRENT_NODE
Choose Uncommitted Read in the stage and yes this is syntax specifically related to partitioned tables.

Keep in mind that COL1 is the column/key which would have been used in the DDL for the hash key for the table in question.
Post Reply