Archiving of a Table: DTS Packages vs DataStage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Latha1919
Premium Member
Premium Member
Posts: 178
Joined: Mon May 22, 2006 2:32 pm

Archiving of a Table: DTS Packages vs DataStage

Post by Latha1919 »

Hi,

I need some information on the following.

We have a table in production database wih millions of records and as it is volumninous, we are planning to go for archiving of that table. We would like to archive 4 years data which may be upto 65lakhs of records.

At present, I have two options, One being creation of DTS packages and other one is to create some DataStage jobs.

Which one would be the right choice for archiving.

Thanks,
Latha
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Define what 'archiving' means in this case.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

It doesnt matter how 'volumninous' the data is, backup's of databases are always taken. Do you need additional copies of that data to be stored? Isnt that waste of space and resources? Or is your requirement due to some purging requirements?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Latha1919
Premium Member
Premium Member
Posts: 178
Joined: Mon May 22, 2006 2:32 pm

Post by Latha1919 »

Here, archiving is about copying the data into history table in the archive database (to be created) and deleting those records from the original table.

Thanks,
Latha
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You certainly can use DataStage to do this. Typically, however, if you are simply going to move large amounts of data between two tables with no 'transformations', you'd be better off doing it with the tools supplied by the database vendor. Or just a big parallel query.

Another 'however' however - if a DTS package is anything like an Oracle stored procedure, a well written DS job could beat the pants off a poorly implemented stored procedure. :wink:

You may also want to reconsider the 'deleting those records from the original table' statement if the number of records deleted is a significant portion of the total. You may be better off creating a new table with just the records you want to keep and swapping it around for the original table. This also keeps the original table around as a 'backup' in case Something Goes Horribly Wrong.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I could sense purging :lol:
The simplest way is to maintain two tables. Like say if your table name is
myTable then second table would be the history table say, Hist_mytable. Once the records reach a certain limit, in your case 4 years, just rename your myTable to Hist_mytable. Thats what we do here. Saves us a lot of time and resouces and a clean purge.
My 2 cents.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Latha1919
Premium Member
Premium Member
Posts: 178
Joined: Mon May 22, 2006 2:32 pm

Archiving of a Table every month: DTS Packages vs DataStage

Post by Latha1919 »

I forgot to add one line here. For initial archive, we might be having 65Lakhs of records. We will be keeping only 24months data. And after every month, we wil be running the job for removing 25th month data. So, this job will be scheduled to run on first working day of every month.
Every month one lakh of records will be added.

Thanks,
Latha
Latha1919
Premium Member
Premium Member
Posts: 178
Joined: Mon May 22, 2006 2:32 pm

Archiving of a Table every month: DTS Packages vs DataStage

Post by Latha1919 »

I forgot to add one line here. For initial archive, we might be having 65Lakhs of records. We will be keeping only 24months data. And after every month, we wil be running the job for removing 25th month data. So, this job will be scheduled to run on first working day of every month.
Every month one lakh of records will be added.

Thanks,
Latha
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Is it 24 month or 4 years?
No matter what, how big it is (65 lakhs) renaming method should hurt your process. If the space is the constraint and deletion is the only option left, opt for incremental deletion.
i.e., you can further divide the one month data into several stages, inorder to overcome transaction log issues. I could recal there already some post available regarding this.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Doesn't the database itself have archiving tools?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sb_akarmarkar
Participant
Posts: 232
Joined: Fri Sep 30, 2005 4:52 am
Contact:

Post by sb_akarmarkar »

I think DTS package is better option if there is direct tranformation without any cleansing operation as it comes with database.....

Thanks,
Anupam
Latha1919
Premium Member
Premium Member
Posts: 178
Joined: Mon May 22, 2006 2:32 pm

Post by Latha1919 »

Thanks for all your advises.

May I Know what are the advantages of DTS over DataStage. If we have both the options to go for, which one we should prefer for archiving which is about moving the data into another database table simply. This should be scheduled every month, first working day. We will be keeping only 2yrs of data and so during initial archive we would be removing 3rd year to 6th year data.

Please suggest me as early as possible.

Thanks,
Latha
cconley
Premium Member
Premium Member
Posts: 4
Joined: Fri Nov 12, 2004 12:21 am
Contact:

Post by cconley »

Here are some other ideas...

If you are familiar with how partition tables work in SQL Server, you could create a partition view with monthly partitions (which in SQL Server equate to physical tables) for both the "current" and "archive" tables.

This would allow you some flexibility in how to handle the archiving:
1. Load both the "current" and "archive" tables at the same time. Then purge the appropriate monthly partitition in the "current" table each month. This would save time in the purging b/c you can do a truncate on the monthly partition
2. Recreate the partition views each month, including only the current 24 months in the "current" partition view and the rest of history in the "archive" partition view. That way no data has to move.

As for your question of DataStage vs DTS, there are a number of factors to consider. For example, who will be responsible for the archiving? If you have a DBA, this could be a DBA task in which case they probably won't use DataStage or DTS. If the DW team is responsible DataStage could make more sense for standardization and scheduling purposes. Your developers probably know DS better than DTS, besides why maintain code in two tools. Also, it will probably be easier to link a DS archive job to others that populate the "current" table.

Archiving is probably best served as a DBA task, however, if you must choose between DTS and DataStage, I would go with DataStage for the reasons I described above.
Post Reply