Page 1 of 1

DataSets or Database tables for staging

Posted: Thu Feb 19, 2009 6:37 am
by Azzuri
Hi All, I'm at Client and we are building our Datastage architecture and there is a debate on whether to use Database tables for a staging or Datasets. I've always used and prefer to use Datasets. Just wanted to get different opinions.

Posted: Thu Feb 19, 2009 6:42 am
by Mike
Use whatever makes the transformation/integration from source(s) to target easiest. I usually end of with some combination of the two. If there is no advantage to using a database table, then use datasets for the superior performance that you get from working off of the file system.

Mike

Posted: Thu Feb 19, 2009 11:03 am
by uegodawa
This depends on couple of factors.

When ever you want to view a particular record from Table it's so easy to view data but it's so difficult from datasets when no.of records grows.

When you want to move staging data from one box to another; if you're using Tables it's easy to move; Since datasets contains some host specific information simply coping dataset from one box to another doesn't allow you to view data.

Chances are there Datasets can be erased by accidently by other users having same privileges. Tables , only authorised users can view/delete information.

When ever you want to sort staging records, ORDER BY clause will take care and it's done on DB , but datasets utilize DSServer all kind of such operations.

If you've only handful records and datasets it's worth to use datasets rather than tables.

Posted: Thu Feb 19, 2009 11:07 am
by ArndW
While I don't agree with some of the arguments that upul used, I do think that there is no clear-cut answer to this question. It depends upon what your requirements are, I'll just list 2 factors, there are many more:

- if speed is the highest priority, go for dataset or filesets
- if other applications need to access this interim data, use tables

Posted: Thu Feb 19, 2009 12:15 pm
by Azzuri
Other advantages for Datasets:

-Data can be partitioned during the intitial data pull rather than storing it in a table and then partiioning it later
- Also our Database is Teradata which is expensive if you need a lrage staging DB area. Our current ETL server has roughy 350 Gig we can use for staging datasets
- Don't need additonal processing to archive the staging area for example you may have a retention of 7 days for daily extracts

Thanks for your responses so far

Posted: Thu Feb 19, 2009 3:28 pm
by ray.wurlod
Not only partitioning. Sorted order is also preserved in Data Sets.

Posted: Fri Feb 20, 2009 4:59 am
by ajay.vaidyanathan
i would like to go with tables coz its easy to identify ane kind of data issue with the help of a query.......in datasets its difficult to identify these data issues........but finally it depends upon ones requirements to have a dataset or a table.........

Posted: Fri Feb 20, 2009 5:01 am
by ajay.vaidyanathan
i would like to go with tables coz its easy to identify ane kind of data issue with the help of a query.......in datasets its difficult to identify these data issues........but finally it depends upon ones requirements to have a dataset or a table.........

Posted: Mon Feb 23, 2009 7:58 am
by Azzuri
Hi All, we decided to go with DataSets :D