Archiving of a Table: DTS Packages vs DataStage
Moderators: chulett, rschirm, roy
Archiving of a Table: DTS Packages vs DataStage
Hi,
I need some information on the following.
We have a table in production database wih millions of records and as it is volumninous, we are planning to go for archiving of that table. We would like to archive 4 years data which may be upto 65lakhs of records.
At present, I have two options, One being creation of DTS packages and other one is to create some DataStage jobs.
Which one would be the right choice for archiving.
Thanks,
Latha
I need some information on the following.
We have a table in production database wih millions of records and as it is volumninous, we are planning to go for archiving of that table. We would like to archive 4 years data which may be upto 65lakhs of records.
At present, I have two options, One being creation of DTS packages and other one is to create some DataStage jobs.
Which one would be the right choice for archiving.
Thanks,
Latha
It doesnt matter how 'volumninous' the data is, backup's of databases are always taken. Do you need additional copies of that data to be stored? Isnt that waste of space and resources? Or is your requirement due to some purging requirements?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
You certainly can use DataStage to do this. Typically, however, if you are simply going to move large amounts of data between two tables with no 'transformations', you'd be better off doing it with the tools supplied by the database vendor. Or just a big parallel query.
Another 'however' however - if a DTS package is anything like an Oracle stored procedure, a well written DS job could beat the pants off a poorly implemented stored procedure.
You may also want to reconsider the 'deleting those records from the original table' statement if the number of records deleted is a significant portion of the total. You may be better off creating a new table with just the records you want to keep and swapping it around for the original table. This also keeps the original table around as a 'backup' in case Something Goes Horribly Wrong.
Another 'however' however - if a DTS package is anything like an Oracle stored procedure, a well written DS job could beat the pants off a poorly implemented stored procedure.
You may also want to reconsider the 'deleting those records from the original table' statement if the number of records deleted is a significant portion of the total. You may be better off creating a new table with just the records you want to keep and swapping it around for the original table. This also keeps the original table around as a 'backup' in case Something Goes Horribly Wrong.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
I could sense purging
The simplest way is to maintain two tables. Like say if your table name is
myTable then second table would be the history table say, Hist_mytable. Once the records reach a certain limit, in your case 4 years, just rename your myTable to Hist_mytable. Thats what we do here. Saves us a lot of time and resouces and a clean purge.
My 2 cents.
The simplest way is to maintain two tables. Like say if your table name is
myTable then second table would be the history table say, Hist_mytable. Once the records reach a certain limit, in your case 4 years, just rename your myTable to Hist_mytable. Thats what we do here. Saves us a lot of time and resouces and a clean purge.
My 2 cents.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Archiving of a Table every month: DTS Packages vs DataStage
I forgot to add one line here. For initial archive, we might be having 65Lakhs of records. We will be keeping only 24months data. And after every month, we wil be running the job for removing 25th month data. So, this job will be scheduled to run on first working day of every month.
Every month one lakh of records will be added.
Thanks,
Latha
Every month one lakh of records will be added.
Thanks,
Latha
Archiving of a Table every month: DTS Packages vs DataStage
I forgot to add one line here. For initial archive, we might be having 65Lakhs of records. We will be keeping only 24months data. And after every month, we wil be running the job for removing 25th month data. So, this job will be scheduled to run on first working day of every month.
Every month one lakh of records will be added.
Thanks,
Latha
Every month one lakh of records will be added.
Thanks,
Latha
Is it 24 month or 4 years?
No matter what, how big it is (65 lakhs) renaming method should hurt your process. If the space is the constraint and deletion is the only option left, opt for incremental deletion.
i.e., you can further divide the one month data into several stages, inorder to overcome transaction log issues. I could recal there already some post available regarding this.
No matter what, how big it is (65 lakhs) renaming method should hurt your process. If the space is the constraint and deletion is the only option left, opt for incremental deletion.
i.e., you can further divide the one month data into several stages, inorder to overcome transaction log issues. I could recal there already some post available regarding this.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 232
- Joined: Fri Sep 30, 2005 4:52 am
- Contact:
Thanks for all your advises.
May I Know what are the advantages of DTS over DataStage. If we have both the options to go for, which one we should prefer for archiving which is about moving the data into another database table simply. This should be scheduled every month, first working day. We will be keeping only 2yrs of data and so during initial archive we would be removing 3rd year to 6th year data.
Please suggest me as early as possible.
Thanks,
Latha
May I Know what are the advantages of DTS over DataStage. If we have both the options to go for, which one we should prefer for archiving which is about moving the data into another database table simply. This should be scheduled every month, first working day. We will be keeping only 2yrs of data and so during initial archive we would be removing 3rd year to 6th year data.
Please suggest me as early as possible.
Thanks,
Latha
Here are some other ideas...
If you are familiar with how partition tables work in SQL Server, you could create a partition view with monthly partitions (which in SQL Server equate to physical tables) for both the "current" and "archive" tables.
This would allow you some flexibility in how to handle the archiving:
1. Load both the "current" and "archive" tables at the same time. Then purge the appropriate monthly partitition in the "current" table each month. This would save time in the purging b/c you can do a truncate on the monthly partition
2. Recreate the partition views each month, including only the current 24 months in the "current" partition view and the rest of history in the "archive" partition view. That way no data has to move.
As for your question of DataStage vs DTS, there are a number of factors to consider. For example, who will be responsible for the archiving? If you have a DBA, this could be a DBA task in which case they probably won't use DataStage or DTS. If the DW team is responsible DataStage could make more sense for standardization and scheduling purposes. Your developers probably know DS better than DTS, besides why maintain code in two tools. Also, it will probably be easier to link a DS archive job to others that populate the "current" table.
Archiving is probably best served as a DBA task, however, if you must choose between DTS and DataStage, I would go with DataStage for the reasons I described above.
If you are familiar with how partition tables work in SQL Server, you could create a partition view with monthly partitions (which in SQL Server equate to physical tables) for both the "current" and "archive" tables.
This would allow you some flexibility in how to handle the archiving:
1. Load both the "current" and "archive" tables at the same time. Then purge the appropriate monthly partitition in the "current" table each month. This would save time in the purging b/c you can do a truncate on the monthly partition
2. Recreate the partition views each month, including only the current 24 months in the "current" partition view and the rest of history in the "archive" partition view. That way no data has to move.
As for your question of DataStage vs DTS, there are a number of factors to consider. For example, who will be responsible for the archiving? If you have a DBA, this could be a DBA task in which case they probably won't use DataStage or DTS. If the DW team is responsible DataStage could make more sense for standardization and scheduling purposes. Your developers probably know DS better than DTS, besides why maintain code in two tools. Also, it will probably be easier to link a DS archive job to others that populate the "current" table.
Archiving is probably best served as a DBA task, however, if you must choose between DTS and DataStage, I would go with DataStage for the reasons I described above.