Staging Data In Sequential File

speedsterrules · Post by **speedsterrules** » Mon Mar 12, 2007 2:04 pm

Is staging data in a Sequential File a good practice?

In other words, is it resonable to create a sequential file during processing and use it to load the table or should we directly load the table?

I guess the amount of data can be a factor but am not sure if we have any related standards.

Thanks

DSguru2B · Post by **DSguru2B** » Mon Mar 12, 2007 2:11 pm

Its not against best practices. I use it all the time. But if the data size is huge, it might not be a good practice. Therefore it depends. But then again you should also have a cleanup process in place that cleans out files so that huge files dont just keep sitting there for months.

kcbland · Post by **kcbland** » Mon Mar 12, 2007 2:13 pm

Ralph Kimball recommends it in his books.

speedsterrules · Post by **speedsterrules** » Mon Mar 12, 2007 2:18 pm

Thanks for the prompt responses.

The file gets over-written every time the job is executed. I am doing it even for huge data size since it enhances the processing time.

Do you think it is advisable?

chulett · Post by **chulett** » Mon Mar 12, 2007 2:25 pm

DSguru2B · Post by **DSguru2B** » Mon Mar 12, 2007 2:29 pm

Its a three time 'Yes' from Craig. Its stone written now.

chulett · Post by **chulett** » Mon Mar 12, 2007 2:30 pm

Too slow.

kcbland · Post by **kcbland** » Mon Mar 12, 2007 2:30 pm

It all depends on what you're doing. If you're building a data warehouse, with staging, storage, and presentation layers, your ETL coe will look differently for each aspect. If you're a DBA and just moving data around, you'll question even using an ETL tool.

In the straight performance question, copying data between two tables within the same database instance is faster to stay within the database and use parallel DML.

If you are going between two disparate databases, you're better served by extracting will as many parallel processes as possible, transforming with as many parallel processes as possible, and then loading as efficiently as possible. By building sequential "staging" files within that processing, you have the ability to shift parallel processing degrees, as well as have restart points and audit trails. You can also bulk load the results for fastest loading allowed. In addition, the database target server may be elsewhere, so that prepared load files can be moved to the database for localized loading.

DSguru2B · Post by **DSguru2B** » Mon Mar 12, 2007 2:33 pm

chulett wrote:Too slow.

Alas, my slow fingers.

speedsterrules · Post by **speedsterrules** » Mon Mar 12, 2007 2:37 pm

I got all the information I need through everyone's comments. Thanks everyone and especially Ken for the elaborate description.

DSguru2B · Post by **DSguru2B** » Mon Mar 12, 2007 2:39 pm

You can rate everyone's replies depending upon how helpful the comments were.

ray.wurlod · Post by **ray.wurlod** » Mon Mar 12, 2007 6:52 pm

Yes.