Page 1 of 1

Staging Data In Sequential File

Posted: Mon Mar 12, 2007 2:04 pm
by speedsterrules
Is staging data in a Sequential File a good practice?

In other words, is it resonable to create a sequential file during processing and use it to load the table or should we directly load the table?

I guess the amount of data can be a factor but am not sure if we have any related standards.

Thanks

Posted: Mon Mar 12, 2007 2:11 pm
by DSguru2B
Its not against best practices. I use it all the time. But if the data size is huge, it might not be a good practice. Therefore it depends. But then again you should also have a cleanup process in place that cleans out files so that huge files dont just keep sitting there for months.

Posted: Mon Mar 12, 2007 2:13 pm
by kcbland
Ralph Kimball recommends it in his books.

Posted: Mon Mar 12, 2007 2:18 pm
by speedsterrules
Thanks for the prompt responses.

The file gets over-written every time the job is executed. I am doing it even for huge data size since it enhances the processing time.

Do you think it is advisable?

Posted: Mon Mar 12, 2007 2:25 pm
by chulett
Yes.

Posted: Mon Mar 12, 2007 2:29 pm
by DSguru2B
Its a three time 'Yes' from Craig. Its stone written now. :wink:

Posted: Mon Mar 12, 2007 2:30 pm
by chulett
Too slow. :wink:

Posted: Mon Mar 12, 2007 2:30 pm
by kcbland
It all depends on what you're doing. If you're building a data warehouse, with staging, storage, and presentation layers, your ETL coe will look differently for each aspect. If you're a DBA and just moving data around, you'll question even using an ETL tool.

In the straight performance question, copying data between two tables within the same database instance is faster to stay within the database and use parallel DML.

If you are going between two disparate databases, you're better served by extracting will as many parallel processes as possible, transforming with as many parallel processes as possible, and then loading as efficiently as possible. By building sequential "staging" files within that processing, you have the ability to shift parallel processing degrees, as well as have restart points and audit trails. You can also bulk load the results for fastest loading allowed. In addition, the database target server may be elsewhere, so that prepared load files can be moved to the database for localized loading.

Posted: Mon Mar 12, 2007 2:33 pm
by DSguru2B
chulett wrote:Too slow. :wink:
Alas, my slow fingers.

Posted: Mon Mar 12, 2007 2:37 pm
by speedsterrules
I got all the information I need through everyone's comments. Thanks everyone and especially Ken for the elaborate description.

Posted: Mon Mar 12, 2007 2:39 pm
by DSguru2B
You can rate everyone's replies depending upon how helpful the comments were.

Posted: Mon Mar 12, 2007 6:52 pm
by ray.wurlod
Yes.

:lol: