problem with job design

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nsm
Premium Member
Premium Member
Posts: 139
Joined: Mon Feb 09, 2004 8:58 am

problem with job design

Post by nsm »

Hello All,

I have relational data model in DB2 now I have to transfer the data from 3 tables in DB2 to 4 Dimensions and 2 facts to my Warehose in UDB.
Source tables have 6,2.5,4 million rows.

No validations ..very few derivations..all are pretty much straight mappings. Once I do the bulk load first time it will be daily delta processing.

Please suggest me is designing the job in SERVER or PARALLEl is better performance wise??

in Server:
1)I can read the data from 3 tables at once with a join query
2)Do the look-ups to dimension tables write to multiple transformers
3)generate the keys using KeyMgmt for inserts
4)keep the keys in memory and then insert/update the records in facts

in Parallel:
1)I can read the data from 3 tables at once with a join query
2)Do the look-ups to dimension tables write to multiple transformers
3)generate the keys by using Cnt+1 whenever I have inserts(where Cnt is the job parm that has the max key from the table)
4)keep the keys in memory and then insert/update the records in facts

I couldn't decide which option is better.I am new to DS EE development.
As most part is either reading from or writing to database.i am feeling like doing it in PX wouldn't help much.

Please suggest me and let me know if what I thought I could do in PX doesn't work.

Thanks
nsm.
Last edited by nsm on Fri Aug 03, 2007 1:51 pm, edited 1 time in total.
dsdev750
Charter Member
Charter Member
Posts: 16
Joined: Sat Jun 04, 2005 10:19 am

Post by dsdev750 »

How do you plan to take care of the Count in a PX job( With multiple partitions )?
nsm
Premium Member
Premium Member
Posts: 139
Joined: Mon Feb 09, 2004 8:58 am

Post by nsm »

When I did the look-up using look up stage even though it ran in Parallel execution mode it worked fine in determining inserts or updates for the whole table. ( first I thought Look-up stage also works in parallel).
then I thought always to DB or from DB its sequential..

So what I thought was after I got the rows out of look up then filter the rows in the transformer to identify the rows to be inserted there and do the Cnt+1 there..

please let me know if my thoughts are okay.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It ONLY works if execution is sequential.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nsm
Premium Member
Premium Member
Posts: 139
Joined: Mon Feb 09, 2004 8:58 am

Post by nsm »

Ray,

You mean the look-up or the count??
if the look-up doesn't work ,Please explain .

another question I had was in Server Jobs if I have to populate the Dimension tables I maintain a hash and I write the records that I write in to the DB table to the hash so that I don't insert Duplicate rows.

in Parallel to achieve the same:
If I have the array size, transction size as 1 does it take care of the above situation??

Thanks
nsm.
Luciana
Participant
Posts: 60
Joined: Fri Jun 10, 2005 7:22 am
Location: Brasil

Post by Luciana »

Use the expression below when needs to begin starting from a specific value (Parallel Jobs)

Initial value + (@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM -1))

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM -1)) => this expression return count begin with 0 then if initial value = 20:

20 + 1 = 20
20 + 2 = 21
....

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) = RESULT
0 + ( 2 * ( 1 -1)) = 0
0 + ( 2 * ( 2 -1)) = 2
0 + ( 2 * ( 3 -1)) = 4
0 + ( 2 * ( 4 -1)) = 6
0 + ( 2 * ( 5 -1)) = 8
1 + ( 2 * ( 1 -1)) = 1
1 + ( 2 * ( 2 -1)) = 3
1 + ( 2 * ( 3 -1)) = 5
1 + ( 2 * ( 4 -1)) = 7
1 + ( 2 * ( 5 -1)) = 9
Post Reply