Page 1 of 1

problem with job design

Posted: Fri Aug 03, 2007 1:38 pm
by nsm
Hello All,

I have relational data model in DB2 now I have to transfer the data from 3 tables in DB2 to 4 Dimensions and 2 facts to my Warehose in UDB.
Source tables have 6,2.5,4 million rows.

No validations ..very few derivations..all are pretty much straight mappings. Once I do the bulk load first time it will be daily delta processing.

Please suggest me is designing the job in SERVER or PARALLEl is better performance wise??

in Server:
1)I can read the data from 3 tables at once with a join query
2)Do the look-ups to dimension tables write to multiple transformers
3)generate the keys using KeyMgmt for inserts
4)keep the keys in memory and then insert/update the records in facts

in Parallel:
1)I can read the data from 3 tables at once with a join query
2)Do the look-ups to dimension tables write to multiple transformers
3)generate the keys by using Cnt+1 whenever I have inserts(where Cnt is the job parm that has the max key from the table)
4)keep the keys in memory and then insert/update the records in facts

I couldn't decide which option is better.I am new to DS EE development.
As most part is either reading from or writing to database.i am feeling like doing it in PX wouldn't help much.

Please suggest me and let me know if what I thought I could do in PX doesn't work.

Thanks
nsm.

Posted: Fri Aug 03, 2007 1:47 pm
by dsdev750
How do you plan to take care of the Count in a PX job( With multiple partitions )?

Posted: Fri Aug 03, 2007 2:00 pm
by nsm
When I did the look-up using look up stage even though it ran in Parallel execution mode it worked fine in determining inserts or updates for the whole table. ( first I thought Look-up stage also works in parallel).
then I thought always to DB or from DB its sequential..

So what I thought was after I got the rows out of look up then filter the rows in the transformer to identify the rows to be inserted there and do the Cnt+1 there..

please let me know if my thoughts are okay.

Posted: Fri Aug 03, 2007 2:22 pm
by ray.wurlod
It ONLY works if execution is sequential.

Posted: Fri Aug 03, 2007 3:06 pm
by nsm
Ray,

You mean the look-up or the count??
if the look-up doesn't work ,Please explain .

another question I had was in Server Jobs if I have to populate the Dimension tables I maintain a hash and I write the records that I write in to the DB table to the hash so that I don't insert Duplicate rows.

in Parallel to achieve the same:
If I have the array size, transction size as 1 does it take care of the above situation??

Thanks
nsm.

Posted: Fri Aug 03, 2007 3:09 pm
by Luciana
Use the expression below when needs to begin starting from a specific value (Parallel Jobs)

Initial value + (@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM -1))

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM -1)) => this expression return count begin with 0 then if initial value = 20:

20 + 1 = 20
20 + 2 = 21
....

@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM - 1)) = RESULT
0 + ( 2 * ( 1 -1)) = 0
0 + ( 2 * ( 2 -1)) = 2
0 + ( 2 * ( 3 -1)) = 4
0 + ( 2 * ( 4 -1)) = 6
0 + ( 2 * ( 5 -1)) = 8
1 + ( 2 * ( 1 -1)) = 1
1 + ( 2 * ( 2 -1)) = 3
1 + ( 2 * ( 3 -1)) = 5
1 + ( 2 * ( 4 -1)) = 7
1 + ( 2 * ( 5 -1)) = 9