Db2 DPF and DS

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Db2 DPF and DS

Post by gagan8877 »

ENVR:
We have a mixture of physical hosts and logical partitions on DB2 DB Server with DPF. 1 coordinator host (Partition #0) and 3 data hosts, each with 4 logical partitions (Partitions #1-#4, #4-#8 and #8-#12)
Most of the data is on partitions 1 through 12, we usually connect to partition #0 (which is why it's called the coordinator).

I read some posts that DB2 Enterprise stage can read/write in parallel in a single job.

My questions might be lil vague, as I am not a DB2 expert and I have never worked with DPF before, so pls excuse me for that and le me know and I will try to rephrase them.

The questions are:

1. If we use a single job and DB2 Enterprise Stage - will multiple threads execute inside DB2 and each reading in parallel from each partition 1-12? Or will it be one thread @ the co-ordinator?

2. Does DB2 Enterprise Stage require both DB2 and DS to be on same platforms?

3. If we were to connect separately to each of the 12 data partitions (with 12 jobs) and connect directly to each one to only process the data on that logical partition and not "speak" to any other partitions. The point of all this is to make DB2 process 12 jobs in parallel, in their own threads and we would therefore get 12 threads instead of a single thread on the coordinator that's doing that. Is it possible to connect directly to a specific partition without the co-ordinator node?

4. If #2 is correct and they are on different platforms and we run 12 different jobs to execute 12 threads (call stored procs or read/write) in parallel? Can we connect to specific partitions. I am assuming we cant use DB2 Enterprise Stage here.

Thanks for your help.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Oooo, oooo - I don't do DB2 but I can answer one of the questions! :wink:

2. Nope!
-craig

"You can never have too many knives" -- Logan Nine Fingers
stefanfrost1
Premium Member
Premium Member
Posts: 99
Joined: Mon Sep 03, 2007 7:49 am
Location: Stockholm, Sweden

Post by stefanfrost1 »

If you use the DB2 enterprise stage properly it will ask the database how data is spread over the 12 partitions (eg partition key) and if that key is present in your in/out stream it will start as many processes as there are data nodes (12 if your data is spread over all partititions. Your tablespace for your particular table might only be spread on 4 nodes and in that case ds will only start 4 processes).

DB2EE-stage can only do a partition read if you use read method = table or auto-sql because as soon as you make for instance a join the partition key algorithm will fail (for obvious reasons).

If partition algorithms fail due to above cases it will send a single node request (regular sql) and operate using a single process instead.
-------------------------------------
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

#2

Post by gagan8877 »

chulett wrote:Oooo, oooo - I don't do DB2 but I can answer one of the questions! :wink:

2. Nope!
I found this post: http://dsxchange.com/viewtopic.php?t=11 ... f2a8ccc550

Where Ray said the answer to # 2 is Yes. Now I am confused which one is true. But I guess thats for 7.x, has this changed for version 8?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I don't see him saying any such thing. Can you point out the part you are referring to, please?
-craig

"You can never have too many knives" -- Logan Nine Fingers
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Db2 DPF

Post by gagan8877 »

chulett wrote:I don't see him saying any such thing. Can you point out the part you are referring to, please?
May be my question is incomplete instead. I wanted to know if DB2 Enterprise Stage requires both DS and DB2 to be on same platform inorder to utilize DB2 parallelism or DPF?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No clue how the "DPF" part affects this answer but outside of that it's still a "no". There's no database that needs to be "co-resident" to work with DataStage that I am aware of. Or to utilize DB2 parallelism.
-craig

"You can never have too many knives" -- Logan Nine Fingers
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Post by gagan8877 »

chulett wrote:No clue how the "DPF" part affects this answer but outside of that it's still a "no". There's no database that needs to be "co-resident" to work with DataStage that I am aware of. Or to utilize DB2 parallelism.
Hi Chullet

The ques in the above mentioned post is:

I have learnt that Datastage(DSEE) would not be able to utilize DB2's partitioning and parallelism if DataStage and DB2 are not on the same box. Is this true? Should DataStage and DB2 need to be on the same box?

And Ray replied:

Not "different box", only "different platform". If both boxes are AIX, for example, all should be well. If DB2 is on AS/400 or mainframe you have to use the DB2 API stage (and therefore can't use DB2 partitioning).


In my case DS is on Windows and DB2 on Linux.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK. That answer is specific to the difference in the AS/400 and mainframe platforms... so it's only an issue if your database is on one of those.
-craig

"You can never have too many knives" -- Logan Nine Fingers
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

As an observer to this one only... looks to me like we're talking at cross-purposes here over 'platform' (i.e. operating system) and physical server.

In essence, DB2 can reside on another server, but I think the issue with trying to use DPF will come, as Ray pointed out, when you go cross-platform (e.g. Datastage on Windows and DB2 on AIX or similar)
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... in general, 'cross platform' is not an issue. Again, from what I'm reading, this is a restriction put in place by the AS/400 and mainframe environments only. And again, no clue how "DPF" affects this answer, knowing less than nothing about it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Testing

Post by gagan8877 »

Hi Miwinter

As mentioned, "generally" cross-platform is not an issue but I was wandering have you actually tested this?

And regarding the other questions:

Can we connect to a specific partitions on DB2, using any DB2 API/EE/PLUG-IN, ODBC or DRS?

Thanks for your help
stefanfrost1
Premium Member
Premium Member
Posts: 99
Joined: Mon Sep 03, 2007 7:49 am
Location: Stockholm, Sweden

Post by stefanfrost1 »

Ok, cross-platform is in general not an issue even if your database is on another plattform than your datastage installation, regardless of your db installation, SMP,MPP,DPF etc.

However there is a "special" feature in datastage enterprise edition in combination with DB2 DPF (eg DB2s MPP solution) that let's datastage use DB2s hasing algorithm and connect to a single node in the dpf setup to operate locally. In this case we by-pass (so to say) the conductor node in the DB2 installation and execute in parallell using datastage. This way we can have an ETL-process that may be correctly partitioned from stage 1 to stage x optimizing the flow. In order to fullt utilize this feature you need datastage and db2 to be installed on similar plattforms with the same OS and version and that datastage runtime library is available on each db2 node (this does not mean a new datastage installation).

Using datastage and connecting to a remote DB2 DPF which is installed on a completely different plattform will force you to connect through a single process to the DB2 conductor node and leave the parrallism to DB2 DPF feature and wait for the result... A option that propably works better for you many times anyway.
-------------------------------------
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Mutilple Jobs

Post by gagan8877 »

Hi Stefan

Thanks for the answer, it was helpful.
Post Reply