Staging Data In Sequential File
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 28
- Joined: Thu Sep 29, 2005 1:50 pm
Staging Data In Sequential File
Is staging data in a Sequential File a good practice?
In other words, is it resonable to create a sequential file during processing and use it to load the table or should we directly load the table?
I guess the amount of data can be a factor but am not sure if we have any related standards.
Thanks
In other words, is it resonable to create a sequential file during processing and use it to load the table or should we directly load the table?
I guess the amount of data can be a factor but am not sure if we have any related standards.
Thanks
Its not against best practices. I use it all the time. But if the data size is huge, it might not be a good practice. Therefore it depends. But then again you should also have a cleanup process in place that cleans out files so that huge files dont just keep sitting there for months.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 28
- Joined: Thu Sep 29, 2005 1:50 pm
It all depends on what you're doing. If you're building a data warehouse, with staging, storage, and presentation layers, your ETL coe will look differently for each aspect. If you're a DBA and just moving data around, you'll question even using an ETL tool.
In the straight performance question, copying data between two tables within the same database instance is faster to stay within the database and use parallel DML.
If you are going between two disparate databases, you're better served by extracting will as many parallel processes as possible, transforming with as many parallel processes as possible, and then loading as efficiently as possible. By building sequential "staging" files within that processing, you have the ability to shift parallel processing degrees, as well as have restart points and audit trails. You can also bulk load the results for fastest loading allowed. In addition, the database target server may be elsewhere, so that prepared load files can be moved to the database for localized loading.
In the straight performance question, copying data between two tables within the same database instance is faster to stay within the database and use parallel DML.
If you are going between two disparate databases, you're better served by extracting will as many parallel processes as possible, transforming with as many parallel processes as possible, and then loading as efficiently as possible. By building sequential "staging" files within that processing, you have the ability to shift parallel processing degrees, as well as have restart points and audit trails. You can also bulk load the results for fastest loading allowed. In addition, the database target server may be elsewhere, so that prepared load files can be moved to the database for localized loading.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 28
- Joined: Thu Sep 29, 2005 1:50 pm
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: