DataSets or Database tables for staging
Moderators: chulett, rschirm, roy
DataSets or Database tables for staging
Hi All, I'm at Client and we are building our Datastage architecture and there is a debate on whether to use Database tables for a staging or Datasets. I've always used and prefer to use Datasets. Just wanted to get different opinions.
This depends on couple of factors.
When ever you want to view a particular record from Table it's so easy to view data but it's so difficult from datasets when no.of records grows.
When you want to move staging data from one box to another; if you're using Tables it's easy to move; Since datasets contains some host specific information simply coping dataset from one box to another doesn't allow you to view data.
Chances are there Datasets can be erased by accidently by other users having same privileges. Tables , only authorised users can view/delete information.
When ever you want to sort staging records, ORDER BY clause will take care and it's done on DB , but datasets utilize DSServer all kind of such operations.
If you've only handful records and datasets it's worth to use datasets rather than tables.
When ever you want to view a particular record from Table it's so easy to view data but it's so difficult from datasets when no.of records grows.
When you want to move staging data from one box to another; if you're using Tables it's easy to move; Since datasets contains some host specific information simply coping dataset from one box to another doesn't allow you to view data.
Chances are there Datasets can be erased by accidently by other users having same privileges. Tables , only authorised users can view/delete information.
When ever you want to sort staging records, ORDER BY clause will take care and it's done on DB , but datasets utilize DSServer all kind of such operations.
If you've only handful records and datasets it's worth to use datasets rather than tables.
Thanks,
Upul
Upul
While I don't agree with some of the arguments that upul used, I do think that there is no clear-cut answer to this question. It depends upon what your requirements are, I'll just list 2 factors, there are many more:
- if speed is the highest priority, go for dataset or filesets
- if other applications need to access this interim data, use tables
- if speed is the highest priority, go for dataset or filesets
- if other applications need to access this interim data, use tables
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Other advantages for Datasets:
-Data can be partitioned during the intitial data pull rather than storing it in a table and then partiioning it later
- Also our Database is Teradata which is expensive if you need a lrage staging DB area. Our current ETL server has roughy 350 Gig we can use for staging datasets
- Don't need additonal processing to archive the staging area for example you may have a retention of 7 days for daily extracts
Thanks for your responses so far
-Data can be partitioned during the intitial data pull rather than storing it in a table and then partiioning it later
- Also our Database is Teradata which is expensive if you need a lrage staging DB area. Our current ETL server has roughy 350 Gig we can use for staging datasets
- Don't need additonal processing to archive the staging area for example you may have a retention of 7 days for daily extracts
Thanks for your responses so far
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 53
- Joined: Fri Apr 18, 2008 8:13 am
- Location: United States
-
- Participant
- Posts: 53
- Joined: Fri Apr 18, 2008 8:13 am
- Location: United States