Oracle to Oracle performance

DiscGolfer5000 · Post by **DiscGolfer5000** » Fri Aug 20, 2010 2:09 pm

I have a 7.5.2 AIX parallel job that reads 10 million rows from one Oracle database, does minor transformations and does a truncate and load into another Oracle database and it takes 15 hours.

I have split the job into two jobs. First job is the Oracle to transformer to sequential file and that job reads and writes the 10 million rows in 1 hour. The next job reads the sequential file and does the truncate and load into Oracle for 10 million rows and takes 8 minutes which is over 20,000 rows/sec.

All the Ora stages are Oracle Enterprise. Has anyone else run into this? What is the reason for such a difference? Is there a better solution?

Thanks,
Eric

ray.wurlod · Post by **ray.wurlod** » Fri Aug 20, 2010 4:53 pm

Welcome aboard.

What are you doing in the SELECT? Any grouping or join? What are you doing in the Transformer stage?

dsuser_cai · Post by **dsuser_cai** » Fri Aug 20, 2010 6:25 pm

How complex is your select query?
Are the two tables in the same database?
Did you check for any locks?

Its always a good practice to have a staging area, before loading into the target (but depends on the situation). So design your job for optimal performance.

DiscGolfer5000 · Post by **DiscGolfer5000** » Fri Aug 20, 2010 8:30 pm

The job queries one Oracle database and loads into a separate Oracle database. The query is against a view:

SELECT SDATE, LEVEL1, LEVEL2, LEVEL3, LEVEL4, LEVEL5, LEVEL6,CAST(C_PRED AS DECIMAL(38,10)) C_PRED
FROM BIEO_SL_BASELINE_FCST

The view definition has some grouping and joins:

Code: Select all

CREATE OR REPLACE VIEW bieo_sl_baseline_fcst (sdate,
                                              level1,
                                              level2,
                                              level3,
                                              level4,
                                              level5,
                                              level6,
                                              c_pred,
                                              record_type
                                             )
AS
   SELECT sdate, level1, level2, level3, level4, level5, level6, c_pred,
          record_type
     FROM (SELECT   MIN (t_dates_list.from_sales_date) AS sdate,
                    (CASE
                        WHEN (  MIN (t_dates_list.from_sales_date)
                              - TO_DATE ('05-16-2010', 'MM-DD-YYYY')
                             ) <= 0
                           THEN (CASE
                                    WHEN (  MAX (t_dates_list.from_sales_date)
                                          - TO_DATE ('05-16-2010',
                                                     'MM-DD-YYYY'
                                                    )
                                         ) <= 0
                                       THEN 1
                                    ELSE 3
                                 END
                                )
                        ELSE 2
                     END
                    ) AS record_type,
                    MAX (comb_id) AS comb_id, t_comb_list.level_id1,
                    t_comb_list.level_id2, t_comb_list.level_id3,
                    t_comb_list.level_id4, t_comb_list.level_id5,
                    t_comb_list.level_id6,
                    NVL
                       (SUM
                           (  DECODE
                                 (branch_data.enter_fore,
                                  NULL, (NVL
                                            (branch_data.col_for_over,
                                             (  (  (    NVL
                                                           (branch_data.manual_stat,
                                                            NVL
                                                               (branch_data.sim_val_1,
                                                                  branch_data.fore_2
                                                                * 1
                                                               )
                                                           )
                                                      * (  1.00
                                                         + NVL
                                                              (branch_data.manual_fact,
                                                               0
                                                              )
                                                        )
                                                    + NVL
                                                         (branch_data.fixed_prom,
                                                          0
                                                         )
                                                   )
                                                 * (  1
                                                    - NVL
                                                         (branch_data.rule_coll,
                                                          0
                                                         )
                                                   )
                                                )
                                              + (  NVL (branch_data.int_cost,
                                                        0
                                                       )
                                                 * NVL (branch_data.rule_coll,
                                                        0
                                                       )
                                                )
                                             )
                                            )
                                   ),
                                  branch_data.ff
                                 )
                            * 1
                           ),
                        0
                       ) AS c_pred,
                    (CASE
                        WHEN MAX (branch_data.note_id) > 0
                           THEN 1
                        ELSE 0
                     END
                    ) AS note_id,
                    (CASE
                        WHEN MAX (branch_data.is_promotion) > 0
                           THEN 1
                        ELSE 0
                     END
                    ) AS is_promotion,
                    (CASE
                        WHEN MAX (branch_data.is_supply_plan) > 0
                           THEN 1
                        ELSE 0
                     END
                    ) AS is_supply_plan,
                    (CASE
                        WHEN MAX (branch_data.is_scenario_resource) > 0
                           THEN 1
                        ELSE 0
                     END
                    ) AS is_scenario_resource,
                    1 AS is_base_0
               FROM sales_data branch_data,
                    mdp_matrix,
                    t_ep_item,
                    t_ep_e1_parent_ad_num,
                    t_ep_ebs_tp_zone,
                    t_ep_e1_cust_cat_5,
                    t_ep_ebs_account,
                    t_ep_site,
                    int_comb_602_389 t_comb_list,
                    int_date_602_389 t_dates_list
              WHERE branch_data.sales_date >= t_dates_list.from_sales_date
                AND branch_data.sales_date <= t_dates_list.to_sales_date
                AND branch_data.item_id = mdp_matrix.item_id
                AND branch_data.location_id = mdp_matrix.location_id
                AND mdp_matrix.t_ep_item_ep_id = t_ep_item.t_ep_item_ep_id
                AND mdp_matrix.t_ep_e1_parent_ad_num_ep_id =
                             t_ep_e1_parent_ad_num.t_ep_e1_parent_ad_num_ep_id
                AND mdp_matrix.t_ep_ebs_tp_zone_ep_id =
                                       t_ep_ebs_tp_zone.t_ep_ebs_tp_zone_ep_id
                AND mdp_matrix.t_ep_e1_cust_cat_5_ep_id =
                                   t_ep_e1_cust_cat_5.t_ep_e1_cust_cat_5_ep_id
                AND mdp_matrix.t_ep_ebs_account_ep_id =
                                       t_ep_ebs_account.t_ep_ebs_account_ep_id
                AND mdp_matrix.t_ep_site_ep_id = t_ep_site.t_ep_site_ep_id
                AND t_comb_list.level_id1 = mdp_matrix.t_ep_item_ep_id
                AND t_comb_list.level_id2 =
                                        mdp_matrix.t_ep_e1_parent_ad_num_ep_id
                AND t_comb_list.level_id3 = mdp_matrix.t_ep_ebs_tp_zone_ep_id
                AND t_comb_list.level_id4 =
                                           mdp_matrix.t_ep_e1_cust_cat_5_ep_id
                AND t_comb_list.level_id5 = mdp_matrix.t_ep_ebs_account_ep_id
                AND t_comb_list.level_id6 = mdp_matrix.t_ep_site_ep_id
           GROUP BY t_comb_list.level_id1,
                    t_comb_list.level_id2,
                    t_comb_list.level_id3,
                    t_comb_list.level_id4,
                    t_comb_list.level_id5,
                    t_comb_list.level_id6,
                    t_dates_list.from_sales_date) int_table,
          int_comb_602_389
    WHERE int_comb_602_389.level_id1 = int_table.level_id1
      AND int_comb_602_389.level_id2 = int_table.level_id2
      AND int_comb_602_389.level_id3 = int_table.level_id3
      AND int_comb_602_389.level_id4 = int_table.level_id4
      AND int_comb_602_389.level_id5 = int_table.level_id5
      AND int_comb_602_389.level_id6 = int_table.level_id6;

I didn't check for any locks. I have run the 'ora query -> xfm -> seq' job several times and it's consistently taking around 45 to 50 minutes for the above to execute. The seq file is 1.5 GB.

The transformer stage is straight copying the 8 columns, adding 1 column that is set to @NULL, and adding 2 columns that are set to DSJobStartTimestamp.

Thanks,
Eric

ray.wurlod · Post by **ray.wurlod** » Sat Aug 21, 2010 2:09 am

Initialize a stage variable to DSJobStartTimestamp and use this stage variable rather than re-evaluate the macro twice for every row processed.

DiscGolfer5000 · Post by **DiscGolfer5000** » Tue Aug 24, 2010 9:11 am

I tried a job using the stage variable to set DSJobStartTimestamp. Don't know why but it aborts after 4 million rows. Ran it again, same abort. Then I re-ran the previous job that queries -> xfm -> seq file and it ran 10 million rows fine. Must be a bug in DS 7.5.2. It sounds like a good tip but I think the main overall issue must have something to do with landing the data to a text file vs. not landing it. I wonder where DS stores the data when it's not landed using a seq file stage.

Thanks,
Eric

FranklinE · Post by **FranklinE** » Tue Aug 24, 2010 11:45 am

DiscGolfer5000 wrote:...I think the main overall issue must have something to do with landing the data to a text file vs. not landing it. I wonder where DS stores the data when it's not landed using a seq file stage.

Thanks,
Eric

I was wondering along that same line: Is it possible that the CPU/memory being used is maxing out its resources? The data has to "live" somewhere when it's not being landed to a file, and buffer-to-file is normally going to be faster than buffer to Oracle, or so I assume. Maybe this is a hardware performance issue.

chulett · Post by **chulett** » Tue Aug 24, 2010 11:59 am

A number of years ago, Ken Bland posted several items here about landing data, including this thread. If you search for "pressure" with "kcbland" as the author, you'll also find a number of discussions on the concept of "back-pressure" and why landing between the extract and load addresses the issue that you are seeing.

Ken is one of our Premium Posters here, so not much of them will be accessable, I'm afraid, but did want to let you know they are out there. A search may turn up more, I don't recall off the top of my head.