Reading Table through DB2 Enterprise Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Reading Table through DB2 Enterprise Stage

Post by opdas »

Hi,
I want to use DB2 Enterprise Stage to read data from a specified table and use Partiton Table Property so that the execution is parallel.
Can anyone tell me what shall I specify for the property?
Om Prakash


"There are things that are known, and there are things that are unknown, and in between there are doors"
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

You mean Preserve partition flag or the type of partition?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Partition Table property allows you to specify a table that provides the partitioning algorithm. If you are selecting from a partitioned table, you can use the same table name as the Partition Table property as the table name from which you are selecting.

You can always click on Help to learn more about individual properties; they've gone to a great deal of work to provide this on-line help.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Post by opdas »

I'm using the same table name as Partition Table and it gives me multiple records for that.In my case 8 ties the data(we are working on 8 nodes)
ray.wurlod wrote:The Partition Table property allows you to specify a table that provides the partitioning algorithm. If you are selecting from a partitioned table, you can use the same table name as the Partition Ta ...
Om Prakash


"There are things that are known, and there are things that are unknown, and in between there are doors"
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

If you specify a table name, the partition methond is derived from the given table. By default if it is not mentiond it will follow sequential mode.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Looks like you're selecting from a non-partitioned table using a partitioning algorithm, hence you're getting all records on each node - effectively Entire partitioning.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

I guess selecting the same table for Partitioning table forces to Entire partition.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

When we pull from a partitioned table, we use the nodenumber function and current node option in DB2:

Query

Code: Select all

select customer_id, cust_name
from mydb.customer_tb
where nodenumber(customer_id) = current node
Partitioning Table option: mydb.customer_tb

The partitioning table option tells DataStage to retrieve the data in parallel, but if you do not include the 'nodenumber() = current node' clause in your where clause, you will (as you have seen) pull 100% of the data over each partition. That is, if you have 100 rows and 10 partitions, your output will end up being num_partitions * num_rows = 10,000 rows.

I think the field used in the nodenumber() function can be any field, but we usually use the partitioning key field.

If you run the query with no nodenumber clause and no partitioning table specified your output will accurate, but it will take longer to retrieve the data. DB2 will run your query in parallel within the database, but will pool all the data into one stream to return to DataStage.

The other option to try is using the Table query instead. You can specify the select list and a where clause. When executed, DataStage automatically creates a query just like my example above with the nodenumber() = current node and it will also automatically use a partitioning table to achieve a parallel read from the database.

By the way, if you are running user-defined SQL with a join, you can still use the nodenumber() clause. We usually pick the table and partitioning key that'drives' the query. For example, if you are retrieving customer addresses, you may have a query like the following:

Code: Select all

select 
    c.customer_id, 
    c.cust_name, 
    a.street_addr,
    a.city_nm,
    a.state_cd,
    a.zip_9_cd
from mydb.customer_tb c,
    mydb.cust_address_tb a
where c.customer_id = a.customer_id
and nodenumber(c.customer_id) = current node
Partitioning table option: mydb.customer_tb

In this example, The customer table is driving the query, so we use that to partition the data retrieval.

Hope this helps!

Brad.
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Post by opdas »

Brad,
That was great..,it helped me fetching data 4 times as fast.
I would like to add here that you also can use :

Code: Select all

where dbpartitionnum(partition_column_name) = current dbpartitionnum for fetch only
in place of

Code: Select all

where nodenumber(partition_column_name) = current node
it works better with DB2v8
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Post by opdas »

Hi All,

How to achieve the same thing in Oracle Enterprise Stage??
skcwong
Participant
Posts: 2
Joined: Fri Apr 15, 2005 3:56 pm

Post by skcwong »

bcarlson wrote:When we pull from a partitioned table, we use the nodenumber function and current node option in DB2:

Query

Code: Select all

select customer_id, cust_name
from mydb.customer_tb
where nodenumber(customer_id) = current node

Code: Select all

select 
    c.customer_id, 
    c.cust_name, 
    a.street_addr,
    a.city_nm,
    a.state_cd,
    a.zip_9_cd
from mydb.customer_tb c,
    mydb.cust_address_tb a
where c.customer_id = a.customer_id
and nodenumber(c.customer_id) = current node

Brad.
I wonder why the DS documents do not mention this? (ie, using nodenumber or dbpartitionnum)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Perhaps because it's not a DataStage feature, but rather a database feature?
-craig

"You can never have too many knives" -- Logan Nine Fingers
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

A couple of queries on this topic
=====================

What if a table is partitioned on more than one column....?

You don't seem to be able to issue:

where dbpartitionnum(partition_column_name1,partition_column_name2) = current dbpartitionnum

:?:

*************

And also..... what if you have a SQL query joining two or more tables in the DB2EE stage? Is it still possible to use the above in this instance, in some way?

Cheers guys
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Not sure about the dbpartitionnum question. Will have to look into that.

As for the join, we do that all the time - but the tables joined in the SQL are partitioned the same. We just pick one table to use when specifying the partitioning out the output.

If the partitioning is different, we usually have 2 SQL reads and join within DataStage where we can repartition at will.

Brad.
Post Reply