Rows number is different for different execution mode

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
KadetG
Participant
Posts: 30
Joined: Mon Nov 06, 2006 12:43 pm

Rows number is different for different execution mode

Post by KadetG »

Hi All

So... If I set Execution mode in "Default (Sequental)" DB2_UDB_API correct extract the datas. But if I set Execution mode in "Parrallel" Stage etract more data then have in table. But extracting is faster..

Why that is possible? What I do incorrect?
Alex
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Can you explain with some more details. How many number of rows were in table and what was the output number of rows. How many stages were there? what partition is used in all stages? Select query if any?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
KadetG
Participant
Posts: 30
Joined: Mon Nov 06, 2006 12:43 pm

Post by KadetG »

Hi

I have 294886 rows for PRCSS_WK_ENDING_DT 10.11.2006. But If I set parallel then datastage extract 1474430 rows. Job consist of only 3 stages (DB2_UDB_API for extract, Copy stage with parallel by default and DB2_UDB_API with parralel for load).

This is SQL request to DB2: SELECT UTILDM_EMP_GEO_SPRCSS.CMPNY_CD,UTILDM_EMP_GEO_SPRCSS.CTRY_CD,UTILDM_EMP_GEO_SPRCSS.PRCSS_WK_ENDING_DT FROM UTILDM_EMP_GEO_SPRCSS WHERE PRCSS_WK_ENDING_DT = '2006-11-10';
Alex
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Is your statistics shows as 1474430 or the count on your target table? If its the later case, you might be counting the previously inserted data as well. In not, do you see any duplicates in the extracted data for any specific key?
Ensure partition method is 'same' and not Entire in copy stage.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you're executing the same SELECT on each partition you're effectively getting Entire partitioning. If that's not what you want, you're going to have to constrain the query appropriately and/or select in sequential mode and choose an appropriate partitioning algorithm.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
KadetG
Participant
Posts: 30
Joined: Mon Nov 06, 2006 12:43 pm

Post by KadetG »

Hi

2kumar_s
it`s statistics shows and rows count after transfer data.
Hmmm I think that is duplicates...
But why it`s extacted and copied to target table?
I`m sure... That happen if Source DB2_UDB_API and copy stage have "parallel" in execution mode. And all be ok if I set sequentioal for DB2_UDB_API and still parallel for copy stage.
Alex
Post Reply