Node Number clause in SQL queries

srilaxmi · Post by **srilaxmi** » Tue Jul 28, 2009 1:21 am

Hi All,

I need information on how to use the Node number in SQL queries running in the DB2 Enterprise stages of ETL Data Stage jobs when we pull data from a partitioned table (DB2 table).

TIA,
Sri

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Tue Jul 28, 2009 2:21 am

DB2 nodes or DataStage nodes?

miwinter · Post by **miwinter** » Tue Jul 28, 2009 4:53 am

As it's a DB2 function... it's better to check the vendor's product documentation generally, as ever:

http://publib.boulder.ibm.com/infocente ... 000832.htm

srilaxmi · Post by **srilaxmi** » Tue Jul 28, 2009 10:34 pm

Sainath.Srinivasan,

It is Datastage nodes..

chulett · Post by **chulett** » Tue Jul 28, 2009 10:57 pm

A DataStage "node" is a logical concept and not something you can include in a SQL query. What exactly are you trying to accomplish?

miwinter · Post by **miwinter** » Wed Jul 29, 2009 2:37 am

I think this is all very confused... or the OP doesn't fully understand what "node number" is in relation to the DB2EE stage.

The original comment of

"how to use the Node number in SQL queries running in the DB2 Enterprise stages of ETL Data Stage jobs when we pull data from a partitioned table (DB2 table)."

surely suggests that they are looking to run extracts in parallel from a partitioned DB2 table?! Quite how this turns into being about Datastage nodes makes no sense to me at all

chulett · Post by **chulett** » Wed Jul 29, 2009 6:49 am

Exactly, hence my question.

srilaxmi · Post by **srilaxmi** » Mon Aug 03, 2009 5:04 am

here is some more information reg. the node number clause:
When we pull from a partitioned table, we use the nodenumber function and current node option in DB2:

The partitioning table option tells DataStage to retrieve the data in parallel, but if you do not include the noe number clause in your where clause, you will (as you have seen) pull 100% of the data over each partition. That is, if you have 100 rows and 10 partitions, your output will end up being num_partitions * num_rows = 10,000 rows.

If you run the query with no nodenumber clause and no partitioning table specified your output will accurate, but it will take longer to retrieve the data. DB2 will run your query in parallel within the database, but will pool all the data into one stream to return to DataStage.

I just want to know how can we use this node number clause in SQL queries of DB2 entreprise stages of Datastage, to improve performance and to make use of DB2 partitioning.

Thanks,
Sri

ray.wurlod · Post by **ray.wurlod** » Mon Aug 03, 2009 4:52 pm

Don't bother, is my advice. The DB2 Enterprise stage should automatically (if you've told it the partition table name) use the inbuilt DB2 partitioning.

andjames · Post by **andjames** » Fri Nov 20, 2009 2:09 am

[quote="ray.wurlod"]Don't bother, is my advice. The DB2 Enterprise stage should automatically (if you've told it the partition table name) use the inbuilt DB2 partitioning. ...[/quote]

This is true if you just select the table and let DataStage handle the input -- BUT if you specifiy your own SQL query to select the date you must ensure that DB2 returns only rows from the partition your datastage node has connected to. You do this by adding a " NODENUMBER(ANYCOL) = CURRENT NODE " to the where clause.

In addition if you have a joined query on two or mode partitoned tables (assumming they have similar partitioning keys!) you should tack
" NODENUMBER(FIRSTTAB.KEYCOL) = NODENUMBER(SECONDTAB.KEYCOL)" to each JOIN ON clause.

This is well worth doing if you can as DB2 can satisify the query without any cross node traffic and each DS node can work completely independently with no repartioning.

ray.wurlod · Post by **ray.wurlod** » Fri Nov 20, 2009 6:33 pm

True but, for some reason, the OP specified that he/she want's to use the DataStage (processing) node number. In our various ways we have been trying to advise that the two (DataStage and DB2 node numbers) are unrelated.

If you know the number of partitions of the table and specify the DB2 node number, then you can certainly perform some arithmetic manipulation to force particular DB2 nodes onto particular DataStage nodes but, in general, I believe that the effort is likely to be wasted. What will you be doing with these data? The only value will obtain if you will be putting them back into an identically partitioned DB2 table.