DataSets or Database tables for staging

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Azzuri
Premium Member
Premium Member
Posts: 122
Joined: Tue May 13, 2008 11:42 am

DataSets or Database tables for staging

Post by Azzuri »

Hi All, I'm at Client and we are building our Datastage architecture and there is a debate on whether to use Database tables for a staging or Datasets. I've always used and prefer to use Datasets. Just wanted to get different opinions.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Use whatever makes the transformation/integration from source(s) to target easiest. I usually end of with some combination of the two. If there is no advantage to using a database table, then use datasets for the superior performance that you get from working off of the file system.

Mike
uegodawa
Participant
Posts: 71
Joined: Thu Apr 27, 2006 12:46 pm

Post by uegodawa »

This depends on couple of factors.

When ever you want to view a particular record from Table it's so easy to view data but it's so difficult from datasets when no.of records grows.

When you want to move staging data from one box to another; if you're using Tables it's easy to move; Since datasets contains some host specific information simply coping dataset from one box to another doesn't allow you to view data.

Chances are there Datasets can be erased by accidently by other users having same privileges. Tables , only authorised users can view/delete information.

When ever you want to sort staging records, ORDER BY clause will take care and it's done on DB , but datasets utilize DSServer all kind of such operations.

If you've only handful records and datasets it's worth to use datasets rather than tables.
Thanks,
Upul
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

While I don't agree with some of the arguments that upul used, I do think that there is no clear-cut answer to this question. It depends upon what your requirements are, I'll just list 2 factors, there are many more:

- if speed is the highest priority, go for dataset or filesets
- if other applications need to access this interim data, use tables
Azzuri
Premium Member
Premium Member
Posts: 122
Joined: Tue May 13, 2008 11:42 am

Post by Azzuri »

Other advantages for Datasets:

-Data can be partitioned during the intitial data pull rather than storing it in a table and then partiioning it later
- Also our Database is Teradata which is expensive if you need a lrage staging DB area. Our current ETL server has roughy 350 Gig we can use for staging datasets
- Don't need additonal processing to archive the staging area for example you may have a retention of 7 days for daily extracts

Thanks for your responses so far
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not only partitioning. Sorted order is also preserved in Data Sets.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ajay.vaidyanathan
Participant
Posts: 53
Joined: Fri Apr 18, 2008 8:13 am
Location: United States

Post by ajay.vaidyanathan »

i would like to go with tables coz its easy to identify ane kind of data issue with the help of a query.......in datasets its difficult to identify these data issues........but finally it depends upon ones requirements to have a dataset or a table.........
Regards
Ajay
ajay.vaidyanathan
Participant
Posts: 53
Joined: Fri Apr 18, 2008 8:13 am
Location: United States

Post by ajay.vaidyanathan »

i would like to go with tables coz its easy to identify ane kind of data issue with the help of a query.......in datasets its difficult to identify these data issues........but finally it depends upon ones requirements to have a dataset or a table.........
Regards
Ajay
Azzuri
Premium Member
Premium Member
Posts: 122
Joined: Tue May 13, 2008 11:42 am

Post by Azzuri »

Hi All, we decided to go with DataSets :D
Post Reply