DB2 Datawarehouse and Datastage best practice

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

DB2 Datawarehouse and Datastage best practice

Post by clarcombe »

We have embarked on a Datawarehouse project with DB2 v9.5. We are experiencing issues with running out table space as we are using the DB2 Connector stage to write to the table.

One reason for this could be the fact we are using Transaction Record Count of 0 to ensure that all of the rows are written otherwise rolled back. But our volumes are not huge for each table (largest table : 620k rows per day)

My first thought was to switch to a DB2 UDB Load stage and bulkload the data. If it partially works then we will have to roll back by hand but at least it will be quicker and we wont run out of space.

For those who have many DWH projects behind you, are there any best practices for loading DB2 data?

Thanks
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
MT
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 09, 2007 3:51 am

Re: DB2 Datawarehouse and Datastage best practice

Post by MT »

clarcombe wrote:We have embarked on a Datawarehouse project with DB2 v9.5. We are experiencing issues with running out table space as we are using the DB2 Connector stage to write to the table.
Hi clarcombe,

I am not sure what you try to tell us - running out of space in a DB2 table space has not much to do with the method you use for loading the data - it is simply a space problem.
Why do you think you are running out of table space?
Any error messages?
Could you please be a little more prcise and give us more details on your problem

Thanks
regards

Michael
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

Apologies for the vagueness but I have not seen the problem first hand. I am assuming we are running out of space because the rollback log is getting too big as we do no commits before all the data has been loaded successfully.

What I am really after though is some information about best practices e.g for fact tables you always use Bulk Load and Reference tables UDB.

Thanks
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
MT
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 09, 2007 3:51 am

Post by MT »

Hi clarcombe

ok - so you are running out of log space - most probably you will receive a SQL0964 telling you that the transaction log is full.

Well - you want best practices - and the answer is:
IT DEPENDS!

So while the answer is correct (although you might not like it) you might consider following points:
- Using Load is MUCH faster and load will not cause a logging problem so Load should always be considered when moving a lot of data
- DB2 Load can only append data or replace the data within a table so it might not be suited for building up histories in Data Warehosues.
- You should use Record Count in order to commit your work more frequently - this will avoid your logging problem when you import data.
- A important best practice point from my point of view is: talk to your DBA and work closely together with the database team. There a great DB2 features which can be facilitated with DataStage to build a great overall solution

hope this helps a little bit....
regards

Michael
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

A important best practice point from my point of view is: talk to your DBA and work closely together with the database team. There a great DB2 features which can be facilitated with DataStage to build a great overall solution
I think this is going to be the best overall solution :) . I have scheduled a meeting for them in a couple of days.
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You will run into this kind of issue if you commit at the end. Try committing more frequently and have a seperate rollback process if things go south.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

My direct DW experience is with 7x and Oracle (source data is DB2 or sequential datasets), so the following is a general comment.

1) To what use will the DW tables be put? If, for example, it's reporting of some kind, consider designing the fact tables with partitions that support the reporting periods. For example, a set of monthly partitions allows you to predict performance as you add rows for a given month, and start fresh with the next month. Bulk loads combined with CDC on the already-loaded rows works well here: the first load of the month becomes the before-data for the next cycle's CDC pass, then truncate and load again. Rinse and repeat. :wink:

2) How complex relationally are the source tables, and how does that effect denormalization? This is a complicated area in which I have mostly practical experience, but it's an important design point you didn't mention.

Best of luck to you.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

Franklin,

In response to your points

1) The DW is purely to be used for reporting. I am not sure at this point whether we will have multiple same day loads so I don't know if CDC will be of any use. Partioning sounds like a good idea but I think DB2 is limited in this area

2) As far as I can see the table structure is very simple and in most cases from the staging tables to the fact tables its a one to one relationship.

Thanks for your comments
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
MT
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 09, 2007 3:51 am

Post by MT »

clarcombe wrote:Partioning sounds like a good idea but I think DB2 is limited in this area
Hi clarcombe,

why do you think DB2 is not good in partitioning?
We do have database partitioning, table partition, and MDC-tables. These are all forms of partitioning and they can be combined in every way.

So this might be an other point for your aganda when you meet with the DBA guys ....
regards

Michael
Post Reply