Page 1 of 1

Reading Table through DB2 Enterprise Stage

Posted: Mon Apr 03, 2006 2:43 am
by opdas
Hi,
I want to use DB2 Enterprise Stage to read data from a specified table and use Partiton Table Property so that the execution is parallel.
Can anyone tell me what shall I specify for the property?

Posted: Mon Apr 03, 2006 6:59 am
by kumar_s
You mean Preserve partition flag or the type of partition?

Posted: Mon Apr 03, 2006 4:02 pm
by ray.wurlod
The Partition Table property allows you to specify a table that provides the partitioning algorithm. If you are selecting from a partitioned table, you can use the same table name as the Partition Table property as the table name from which you are selecting.

You can always click on Help to learn more about individual properties; they've gone to a great deal of work to provide this on-line help.

Posted: Mon Apr 03, 2006 10:16 pm
by opdas
I'm using the same table name as Partition Table and it gives me multiple records for that.In my case 8 ties the data(we are working on 8 nodes)
ray.wurlod wrote:The Partition Table property allows you to specify a table that provides the partitioning algorithm. If you are selecting from a partitioned table, you can use the same table name as the Partition Ta ...

Posted: Mon Apr 03, 2006 10:30 pm
by kumar_s
If you specify a table name, the partition methond is derived from the given table. By default if it is not mentiond it will follow sequential mode.

Posted: Tue Apr 04, 2006 1:42 am
by ray.wurlod
Looks like you're selecting from a non-partitioned table using a partitioning algorithm, hence you're getting all records on each node - effectively Entire partitioning.

Posted: Tue Apr 04, 2006 3:38 am
by kumar_s
I guess selecting the same table for Partitioning table forces to Entire partition.

Posted: Wed Apr 05, 2006 4:43 pm
by bcarlson
When we pull from a partitioned table, we use the nodenumber function and current node option in DB2:

Query

Code: Select all

select customer_id, cust_name
from mydb.customer_tb
where nodenumber(customer_id) = current node
Partitioning Table option: mydb.customer_tb

The partitioning table option tells DataStage to retrieve the data in parallel, but if you do not include the 'nodenumber() = current node' clause in your where clause, you will (as you have seen) pull 100% of the data over each partition. That is, if you have 100 rows and 10 partitions, your output will end up being num_partitions * num_rows = 10,000 rows.

I think the field used in the nodenumber() function can be any field, but we usually use the partitioning key field.

If you run the query with no nodenumber clause and no partitioning table specified your output will accurate, but it will take longer to retrieve the data. DB2 will run your query in parallel within the database, but will pool all the data into one stream to return to DataStage.

The other option to try is using the Table query instead. You can specify the select list and a where clause. When executed, DataStage automatically creates a query just like my example above with the nodenumber() = current node and it will also automatically use a partitioning table to achieve a parallel read from the database.

By the way, if you are running user-defined SQL with a join, you can still use the nodenumber() clause. We usually pick the table and partitioning key that'drives' the query. For example, if you are retrieving customer addresses, you may have a query like the following:

Code: Select all

select 
    c.customer_id, 
    c.cust_name, 
    a.street_addr,
    a.city_nm,
    a.state_cd,
    a.zip_9_cd
from mydb.customer_tb c,
    mydb.cust_address_tb a
where c.customer_id = a.customer_id
and nodenumber(c.customer_id) = current node
Partitioning table option: mydb.customer_tb

In this example, The customer table is driving the query, so we use that to partition the data retrieval.

Hope this helps!

Brad.

Posted: Thu Apr 13, 2006 1:12 am
by opdas
Brad,
That was great..,it helped me fetching data 4 times as fast.
I would like to add here that you also can use :

Code: Select all

where dbpartitionnum(partition_column_name) = current dbpartitionnum for fetch only
in place of

Code: Select all

where nodenumber(partition_column_name) = current node
it works better with DB2v8

Posted: Thu Apr 13, 2006 2:55 am
by opdas
Hi All,

How to achieve the same thing in Oracle Enterprise Stage??

Posted: Mon Jun 26, 2006 5:09 pm
by skcwong
bcarlson wrote:When we pull from a partitioned table, we use the nodenumber function and current node option in DB2:

Query

Code: Select all

select customer_id, cust_name
from mydb.customer_tb
where nodenumber(customer_id) = current node

Code: Select all

select 
    c.customer_id, 
    c.cust_name, 
    a.street_addr,
    a.city_nm,
    a.state_cd,
    a.zip_9_cd
from mydb.customer_tb c,
    mydb.cust_address_tb a
where c.customer_id = a.customer_id
and nodenumber(c.customer_id) = current node

Brad.
I wonder why the DS documents do not mention this? (ie, using nodenumber or dbpartitionnum)

Posted: Mon Jun 26, 2006 5:50 pm
by chulett
Perhaps because it's not a DataStage feature, but rather a database feature?

Posted: Wed Jan 17, 2007 8:37 am
by miwinter
A couple of queries on this topic
=====================

What if a table is partitioned on more than one column....?

You don't seem to be able to issue:

where dbpartitionnum(partition_column_name1,partition_column_name2) = current dbpartitionnum

:?:

*************

And also..... what if you have a SQL query joining two or more tables in the DB2EE stage? Is it still possible to use the above in this instance, in some way?

Cheers guys

Posted: Wed Jan 17, 2007 9:12 am
by bcarlson
Not sure about the dbpartitionnum question. Will have to look into that.

As for the join, we do that all the time - but the tables joined in the SQL are partitioned the same. We just pick one table to use when specifying the partitioning out the output.

If the partitioning is different, we usually have 2 SQL reads and join within DataStage where we can repartition at will.

Brad.