Hi All,
We are using Server jobs in our project and we got a requirement in which we need to process large amounts of data. The jobs need to run on the monthly basis and the data files are received on daily basis. We are loading the data files into our Source tables (Oracle 10g) on daily basis and at the end of the month we will be fetching all the closed and active accounts data from our source, aggregating and applying the business logic and then populating into flat files. On average we are getting 10 million records for 3 days so effectively we will be getting nearly 100 million records at the end of the month. We have already designed the job but for 1 day of data the job is taking nearly 1 hour. So for the month end run it will be taking many many hours. We tried a no of design options which are as below:
Option1: (Execution time nearly 1 hour for 2 million records, going to DB twice for reading and once for writing)
Job1:
We are taking all the distinct Transaction Ids and loading them into a hash file. So we hit the Database for the first time.
Job2:
In this job we will getting all the records from the source and aggregate the data on the basis of distinct Transaction Ids and load them into a hash file.
Job3: In this job we will be using the hash file with distinct Transaction Ids and do a lookup on the hash file which is having distinct Transaction Ids and the aggregate amounts, get the amounts related to the distinct Transaction Ids and load them into the table.
Option2: (Execution time nearly 35 minutes for 2 million records, going to DB once for reading and once for writing)
Job1:
We are taking all the distinct Transaction Ids and aggregating them and loading them into a hash file.
Job2: In this job we will be using the hash file with distinct Transaction Ids and the aggregate amounts and load them into the table.
The problem with both the above designs is that they are taking very long time to run. Please guide if we can improve/change the design to boost the performance of jobs.
Thank You in advance.
Punar
Processing Large Amounts of data
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 71
- Joined: Mon Nov 13, 2006 12:40 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Consider ways in which you can build in some parallel processing, for example multi-instance jobs.
Can you create an Oracle table of the distinct transaction IDs and join to that, to minimize the number of rows that you need to extract from Oracle? Such a table is called a "bucket table" - it contains a "bucket of keys".
Separate processing for active and closed accounts?
What does "applying the business logic" involve? Are there any opportunities for efficiency here?
Have you monitored your jobs to identify the "hot spots", both with the Monitor and with collection of statistics on the stage tracing tab when you issue a job run request through Director?
These are just some ideas quickly without much thought. I'm sure there are probably others if the actual application were known in more detail.
Can you create an Oracle table of the distinct transaction IDs and join to that, to minimize the number of rows that you need to extract from Oracle? Such a table is called a "bucket table" - it contains a "bucket of keys".
Separate processing for active and closed accounts?
What does "applying the business logic" involve? Are there any opportunities for efficiency here?
Have you monitored your jobs to identify the "hot spots", both with the Monitor and with collection of statistics on the stage tracing tab when you issue a job run request through Director?
These are just some ideas quickly without much thought. I'm sure there are probably others if the actual application were known in more detail.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
This is one of suggestion.
As you get daily data, you must process it daily to complete it fast.
You can process the 10 million data, aggregate it daily and do summarized operations. At the end of the month you can again do a summary or rollup of all the daily summarized data. At this time you will have only few records comparatively.
If your DB write is taking time then go for EXCHANGE PARTITION. This is faster than SQLLDR, fast load.
As you get daily data, you must process it daily to complete it fast.
You can process the 10 million data, aggregate it daily and do summarized operations. At the end of the month you can again do a summary or rollup of all the daily summarized data. At this time you will have only few records comparatively.
If your DB write is taking time then go for EXCHANGE PARTITION. This is faster than SQLLDR, fast load.
Thanks, BK
-
- Premium Member
- Posts: 71
- Joined: Mon Nov 13, 2006 12:40 am
Building a little on the groundwork Ray and Bharani have already laid for you:
I had the same problem with very large amounts of data taking too much time to process. I removed the hashed file creation part of the process and switched to staging tables holding the same information (actually a little more, which removed another step later on). Also, if it works in your process to split Inserts and Updates to separate processing files, you may be able to take advantage of bulk loading for inserts and whatever is best for updates (sorry, I'm not Oracle compliant )
Hope that helps!
I had the same problem with very large amounts of data taking too much time to process. I removed the hashed file creation part of the process and switched to staging tables holding the same information (actually a little more, which removed another step later on). Also, if it works in your process to split Inserts and Updates to separate processing files, you may be able to take advantage of bulk loading for inserts and whatever is best for updates (sorry, I'm not Oracle compliant )
Hope that helps!
Bestest!
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services
"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"