Page 1 of 1

Archiving of a Table: DTS Packages vs DataStage

Posted: Mon Jun 19, 2006 6:22 am
by Latha1919
Hi,

I need some information on the following.

We have a table in production database wih millions of records and as it is volumninous, we are planning to go for archiving of that table. We would like to archive 4 years data which may be upto 65lakhs of records.

At present, I have two options, One being creation of DTS packages and other one is to create some DataStage jobs.

Which one would be the right choice for archiving.

Thanks,
Latha

Posted: Mon Jun 19, 2006 6:24 am
by chulett
Define what 'archiving' means in this case.

Posted: Mon Jun 19, 2006 6:31 am
by DSguru2B
It doesnt matter how 'volumninous' the data is, backup's of databases are always taken. Do you need additional copies of that data to be stored? Isnt that waste of space and resources? Or is your requirement due to some purging requirements?

Posted: Mon Jun 19, 2006 6:32 am
by Latha1919
Here, archiving is about copying the data into history table in the archive database (to be created) and deleting those records from the original table.

Thanks,
Latha

Posted: Mon Jun 19, 2006 6:46 am
by chulett
You certainly can use DataStage to do this. Typically, however, if you are simply going to move large amounts of data between two tables with no 'transformations', you'd be better off doing it with the tools supplied by the database vendor. Or just a big parallel query.

Another 'however' however - if a DTS package is anything like an Oracle stored procedure, a well written DS job could beat the pants off a poorly implemented stored procedure. :wink:

You may also want to reconsider the 'deleting those records from the original table' statement if the number of records deleted is a significant portion of the total. You may be better off creating a new table with just the records you want to keep and swapping it around for the original table. This also keeps the original table around as a 'backup' in case Something Goes Horribly Wrong.

Posted: Mon Jun 19, 2006 7:08 am
by DSguru2B
I could sense purging :lol:
The simplest way is to maintain two tables. Like say if your table name is
myTable then second table would be the history table say, Hist_mytable. Once the records reach a certain limit, in your case 4 years, just rename your myTable to Hist_mytable. Thats what we do here. Saves us a lot of time and resouces and a clean purge.
My 2 cents.

Archiving of a Table every month: DTS Packages vs DataStage

Posted: Mon Jun 19, 2006 7:36 am
by Latha1919
I forgot to add one line here. For initial archive, we might be having 65Lakhs of records. We will be keeping only 24months data. And after every month, we wil be running the job for removing 25th month data. So, this job will be scheduled to run on first working day of every month.
Every month one lakh of records will be added.

Thanks,
Latha

Archiving of a Table every month: DTS Packages vs DataStage

Posted: Mon Jun 19, 2006 7:37 am
by Latha1919
I forgot to add one line here. For initial archive, we might be having 65Lakhs of records. We will be keeping only 24months data. And after every month, we wil be running the job for removing 25th month data. So, this job will be scheduled to run on first working day of every month.
Every month one lakh of records will be added.

Thanks,
Latha

Posted: Mon Jun 19, 2006 8:01 am
by kumar_s
Is it 24 month or 4 years?
No matter what, how big it is (65 lakhs) renaming method should hurt your process. If the space is the constraint and deletion is the only option left, opt for incremental deletion.
i.e., you can further divide the one month data into several stages, inorder to overcome transaction log issues. I could recal there already some post available regarding this.

Posted: Mon Jun 19, 2006 4:55 pm
by ray.wurlod
Doesn't the database itself have archiving tools?

Posted: Tue Jun 20, 2006 5:32 am
by sb_akarmarkar
I think DTS package is better option if there is direct tranformation without any cleansing operation as it comes with database.....

Thanks,
Anupam

Posted: Tue Jun 20, 2006 5:59 am
by Latha1919
Thanks for all your advises.

May I Know what are the advantages of DTS over DataStage. If we have both the options to go for, which one we should prefer for archiving which is about moving the data into another database table simply. This should be scheduled every month, first working day. We will be keeping only 2yrs of data and so during initial archive we would be removing 3rd year to 6th year data.

Please suggest me as early as possible.

Thanks,
Latha

Posted: Tue Jun 20, 2006 12:02 pm
by cconley
Here are some other ideas...

If you are familiar with how partition tables work in SQL Server, you could create a partition view with monthly partitions (which in SQL Server equate to physical tables) for both the "current" and "archive" tables.

This would allow you some flexibility in how to handle the archiving:
1. Load both the "current" and "archive" tables at the same time. Then purge the appropriate monthly partitition in the "current" table each month. This would save time in the purging b/c you can do a truncate on the monthly partition
2. Recreate the partition views each month, including only the current 24 months in the "current" partition view and the rest of history in the "archive" partition view. That way no data has to move.

As for your question of DataStage vs DTS, there are a number of factors to consider. For example, who will be responsible for the archiving? If you have a DBA, this could be a DBA task in which case they probably won't use DataStage or DTS. If the DW team is responsible DataStage could make more sense for standardization and scheduling purposes. Your developers probably know DS better than DTS, besides why maintain code in two tools. Also, it will probably be easier to link a DS archive job to others that populate the "current" table.

Archiving is probably best served as a DBA task, however, if you must choose between DTS and DataStage, I would go with DataStage for the reasons I described above.