Processing Large Amounts of data
Posted: Fri Aug 29, 2008 2:41 am
Hi All,
We are using Server jobs in our project and we got a requirement in which we need to process large amounts of data. The jobs need to run on the monthly basis and the data files are received on daily basis. We are loading the data files into our Source tables (Oracle 10g) on daily basis and at the end of the month we will be fetching all the closed and active accounts data from our source, aggregating and applying the business logic and then populating into flat files. On average we are getting 10 million records for 3 days so effectively we will be getting nearly 100 million records at the end of the month. We have already designed the job but for 1 day of data the job is taking nearly 1 hour. So for the month end run it will be taking many many hours. We tried a no of design options which are as below:
Option1: (Execution time nearly 1 hour for 2 million records, going to DB twice for reading and once for writing)
Job1:
We are taking all the distinct Transaction Ids and loading them into a hash file. So we hit the Database for the first time.
Job2:
In this job we will getting all the records from the source and aggregate the data on the basis of distinct Transaction Ids and load them into a hash file.
Job3: In this job we will be using the hash file with distinct Transaction Ids and do a lookup on the hash file which is having distinct Transaction Ids and the aggregate amounts, get the amounts related to the distinct Transaction Ids and load them into the table.
Option2: (Execution time nearly 35 minutes for 2 million records, going to DB once for reading and once for writing)
Job1:
We are taking all the distinct Transaction Ids and aggregating them and loading them into a hash file.
Job2: In this job we will be using the hash file with distinct Transaction Ids and the aggregate amounts and load them into the table.
The problem with both the above designs is that they are taking very long time to run. Please guide if we can improve/change the design to boost the performance of jobs.
Thank You in advance.
Punar
We are using Server jobs in our project and we got a requirement in which we need to process large amounts of data. The jobs need to run on the monthly basis and the data files are received on daily basis. We are loading the data files into our Source tables (Oracle 10g) on daily basis and at the end of the month we will be fetching all the closed and active accounts data from our source, aggregating and applying the business logic and then populating into flat files. On average we are getting 10 million records for 3 days so effectively we will be getting nearly 100 million records at the end of the month. We have already designed the job but for 1 day of data the job is taking nearly 1 hour. So for the month end run it will be taking many many hours. We tried a no of design options which are as below:
Option1: (Execution time nearly 1 hour for 2 million records, going to DB twice for reading and once for writing)
Job1:
We are taking all the distinct Transaction Ids and loading them into a hash file. So we hit the Database for the first time.
Job2:
In this job we will getting all the records from the source and aggregate the data on the basis of distinct Transaction Ids and load them into a hash file.
Job3: In this job we will be using the hash file with distinct Transaction Ids and do a lookup on the hash file which is having distinct Transaction Ids and the aggregate amounts, get the amounts related to the distinct Transaction Ids and load them into the table.
Option2: (Execution time nearly 35 minutes for 2 million records, going to DB once for reading and once for writing)
Job1:
We are taking all the distinct Transaction Ids and aggregating them and loading them into a hash file.
Job2: In this job we will be using the hash file with distinct Transaction Ids and the aggregate amounts and load them into the table.
The problem with both the above designs is that they are taking very long time to run. Please guide if we can improve/change the design to boost the performance of jobs.
Thank You in advance.
Punar