Loading 2.5 Billion Records in a Fact Table

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
stiwari
Participant
Posts: 64
Joined: Wed Apr 05, 2006 3:45 am

Loading 2.5 Billion Records in a Fact Table

Post by stiwari »

Hi

I have a fact table which is going to have around 2.5 billion records in the first load and after that around 20 million records in the subsequent loads. Can anyone give an estimated time it will take to load the 2.5 million records. The Source and target are both in DB2. The mappings are not so complex. I don't want the accurate time but an estimate.

The fact table is located on the AIX/P5 IBM Server. and source and target both are connected through network that means the source and target table are located on two different machines.
Umbix62
Participant
Posts: 79
Joined: Mon Jan 16, 2006 2:47 pm

Re: Loading 2.5 Billion Records in a Fact Table

Post by Umbix62 »

stiwari wrote:Hi

I have a fact table which is going to have around 2.5 billion records in the first load and after that around 20 million records in the subsequent loads. Can anyone give an estimated time it will take to load the 2.5 million records. The Source and target are both in DB2. The mappings are not so complex. I don't want the accurate time but an estimate.

The fact table is located on the AIX/P5 IBM Server. and source and target both are connected through network that means the source and target table are located on two different machines.
Hi

it is not easy to estimate the time of your load.
We had some years ago the same problem. Our Jobs had to load 3.5 Billion of records every month and the only method to do that quickly was to partion tha target table, split the input data stream to load them separately but in parallel.

Sorry for mu english, I know that it is orrible.

Umberto
raj_konig
Participant
Posts: 67
Joined: Thu Dec 22, 2005 12:27 am

Post by raj_konig »

Hi Stiwari,

If your DB is tuned and working fine then your DS job shud run at a minimun of 1000 records per second.

Thanks,
raj
stiwari
Participant
Posts: 64
Joined: Wed Apr 05, 2006 3:45 am

Post by stiwari »

Hi Umbix62,

Thanx for the reply. Just one quick question
How much time it took to load 3.5 billion records and i would be more than thank ful if you can tell me what kind of partition you created and how many jobs you ran in parallel to load the data.


Thanks
Sumit
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

To put your question differently, you have just asked "I am driving from New York to Los Angeles. How long will it take me to get there?".

Well, are you using a truck or a Porsche or walking? Are you going cross-country or using roads? The list of questions goes on and on. The same applies to your actual question. Is the DS server heavily loaded, is the DB/2 instance heavily used? Is it partitioned? What extract and what load method are you using? What are the data volumes (i.e. how many bytes per row - one CHAR(1) field or 500 columns of char(1024)?

With a similar configuration I've seen server jobs to ~ 60,000 rows per second using the normal load stage and a bit of tweaking. It would most likely go faster using the bulk loader. This is about the initial speeds you would get with PX for this HW configuration and table layout as well, but that can also be tuned upwards.

The answer that stated you would get 1000 rows/second is (imho) not only wrong but also very slow - even for a heavily loaded system. At that rate it would take over 29 days to load.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

42 minutes
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Umbix62
Participant
Posts: 79
Joined: Mon Jan 16, 2006 2:47 pm

Post by Umbix62 »

[quote="stiwari"]Hi Umbix62,

Thanx for the reply. Just one quick question
How much time it took to load 3.5 billion records and i would be more than thank ful if you can tell me what kind of partition you created and how many jobs you ran in parallel to load the data.


Thanks
Sumit[/quote

I have read the ArndW answer and he said exatly what I meant. It's no easy to estimate the time of a load and compare it with other job, devoleped on an other environment and with other tools. It isn't correct for me. Anyway we worked on a IBM Mainframe S/390. The Job was developed in COBOL language and not with datastage. The RDBMS was a IBM/DB2 V5.1 (max degree was 10). The fact table, one table for every month, had 32 partition for each table and we run 10 parallel at time. Every job run in about four/six hour.

I only give you a suggestion, an idea, but my situation was different from your for the reasons explained by ArndW. So the situations aren't comparabile.

Take in consideration it but try to understand if it is the best way to load data more quickly for the reasons exposed by ArndW

Umberto
Post Reply