Node Number clause in SQL queries

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
srilaxmi
Participant
Posts: 22
Joined: Thu Feb 28, 2008 5:21 am

Node Number clause in SQL queries

Post by srilaxmi »

Hi All,

I need information on how to use the Node number in SQL queries running in the DB2 Enterprise stages of ETL Data Stage jobs when we pull data from a partitioned table (DB2 table).

TIA,
Sri
Sri
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

DB2 nodes or DataStage nodes?
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

As it's a DB2 function... it's better to check the vendor's product documentation generally, as ever:

http://publib.boulder.ibm.com/infocente ... 000832.htm
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
srilaxmi
Participant
Posts: 22
Joined: Thu Feb 28, 2008 5:21 am

Post by srilaxmi »

Sainath.Srinivasan,

It is Datastage nodes..
Sri
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

A DataStage "node" is a logical concept and not something you can include in a SQL query. What exactly are you trying to accomplish? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

I think this is all very confused... or the OP doesn't fully understand what "node number" is in relation to the DB2EE stage.

The original comment of
"how to use the Node number in SQL queries running in the DB2 Enterprise stages of ETL Data Stage jobs when we pull data from a partitioned table (DB2 table)."
surely suggests that they are looking to run extracts in parallel from a partitioned DB2 table?! Quite how this turns into being about Datastage nodes makes no sense to me at all :?
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Exactly, hence my question.
-craig

"You can never have too many knives" -- Logan Nine Fingers
srilaxmi
Participant
Posts: 22
Joined: Thu Feb 28, 2008 5:21 am

Post by srilaxmi »

here is some more information reg. the node number clause:
When we pull from a partitioned table, we use the nodenumber function and current node option in DB2:

The partitioning table option tells DataStage to retrieve the data in parallel, but if you do not include the noe number clause in your where clause, you will (as you have seen) pull 100% of the data over each partition. That is, if you have 100 rows and 10 partitions, your output will end up being num_partitions * num_rows = 10,000 rows.

If you run the query with no nodenumber clause and no partitioning table specified your output will accurate, but it will take longer to retrieve the data. DB2 will run your query in parallel within the database, but will pool all the data into one stream to return to DataStage.

I just want to know how can we use this node number clause in SQL queries of DB2 entreprise stages of Datastage, to improve performance and to make use of DB2 partitioning.

Thanks,
Sri
Sri
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Don't bother, is my advice. The DB2 Enterprise stage should automatically (if you've told it the partition table name) use the inbuilt DB2 partitioning.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
andjames
Participant
Posts: 1
Joined: Thu May 28, 2009 6:20 pm

Post by andjames »

[quote="ray.wurlod"]Don't bother, is my advice. The DB2 Enterprise stage should automatically (if you've told it the partition table name) use the inbuilt DB2 partitioning. ...[/quote]

This is true if you just select the table and let DataStage handle the input -- BUT if you specifiy your own SQL query to select the date you must ensure that DB2 returns only rows from the partition your datastage node has connected to. You do this by adding a " NODENUMBER(ANYCOL) = CURRENT NODE " to the where clause.

In addition if you have a joined query on two or mode partitoned tables (assumming they have similar partitioning keys!) you should tack
" NODENUMBER(FIRSTTAB.KEYCOL) = NODENUMBER(SECONDTAB.KEYCOL)" to each JOIN ON clause.


This is well worth doing if you can as DB2 can satisify the query without any cross node traffic and each DS node can work completely independently with no repartioning.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

True but, for some reason, the OP specified that he/she want's to use the DataStage (processing) node number. In our various ways we have been trying to advise that the two (DataStage and DB2 node numbers) are unrelated.

If you know the number of partitions of the table and specify the DB2 node number, then you can certainly perform some arithmetic manipulation to force particular DB2 nodes onto particular DataStage nodes but, in general, I believe that the effort is likely to be wasted. What will you be doing with these data? The only value will obtain if you will be putting them back into an identically partitioned DB2 table.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply