checkpoint/restart safeguards

srikie · Post by **srikie** » Tue Jan 18, 2005 11:42 am

Hi,
Can any one please tell about about features what Ascential Data Stage MVS provides that can help with the following for very large data loads that will require initial loads plus incremental updates:
Describe the checkpoint/restart safeguards we are putting place in the conversion that will ensure that we can recover. State any assumptions that may limit the effectiveness of the solution.
It's kinda urgent. So any help would be greatly appreciated.
Thanks in advance

Mike · Post by **Mike** » Tue Jan 18, 2005 1:28 pm

There are a number of techniques that you can deploy to implement checkpoint/restart in mainframe jobs. There is no magic. You (the job designer) have to design the jobs to be restartable.

I normally try to design an incremental load job that is scalable enough to handle initial load volumes (avoids having two sets of jobs).

If you can tell us a bit about your sources, targets, and data volumes, we may be able to give you some specific suggestions for a restart/recovery design.

Mike

srikie · Post by **srikie** » Tue Jan 18, 2005 1:38 pm

Hi Mike,
Thanks for fast reply. My source is VSAM files and source is db2.
I just want to know few mechanisms which datastage provides. and the
data volume is about 200 gig.
Thanks

Mike · Post by **Mike** » Tue Jan 18, 2005 7:38 pm

I'm not sure I have enough to be very specific with my suggestions, but ...

Here's the general idea:
1) As the job is processing, "status information" needs to be saved off to disk somewhere (DB2 table or flat file).
2) This "status information" then needs to be used to set job parameters to reposition the job to where it left off.

Here are some additional things to think about:

Do your VSAM sources have an audit column that can be used to extract incremental changes? Something like a "last update timestamp" or "create date". Something like these make for a good job parameter.

With DB2 as a target, try to design your process so that it is "insert only". Then use the DB2 load utility to load the target table (DB2 Load Ready stage). This will be orders of magnitude faster than individual SQL insert statements.

If updates can not be avoided because of requirements, then split the updates from the inserts. Apply the updates with the relational stage and bulk load the inserts.

When using the relational stage, your "status information" needs to comprehend database commit points. You may want to consider controlling database commits with a business rule stage.

It sounds like you have multiple VSAM and/or DB2 sources. Will you be using VSAM or DB2 sources as the source for a reference link in a lookup stage?

Mike

srikie · Post by **srikie** » Wed Jan 19, 2005 7:32 am

Hi Mike,
Can you please elaborate a little more on how "saving status information" is acheived. Yeh I work with lot of VSAM source files but I beleive my work would exclusively contain db2 as target not as source.
And regarding the look ups am still not sure about it. I also want to know how can you acheive checkpoints by the columns created date, last update timestamp ,is it through constraint checking? I dint get the commit part when using relational stage.
Thanks
Srikanth

peternolan9 · Post by **peternolan9** » Wed Jan 19, 2005 8:38 am

srikie wrote:Hi,
Can any one please tell about about features what Ascential Data Stage MVS provides that can help with the following for very large data loads that will require initial loads plus incremental updates:
Describe the checkpoint/restart safeguards we are putting place in the conversion that will ensure that we can recover. State any assumptions that may limit the effectiveness of the solution.
It's kinda urgent. So any help would be greatly appreciated.
Thanks in advance

Hi Srikie,
I am 'master' level at restartability in MVS (or at least the older versions of this stuff). And though I have never used DS/MVS I do believe there is no restartability build into it as I would call 'restartability'. It used to be you could define a GSAM database over the top of an VSAM file such that when the restart was performed under a database manager like IMS or DB2 the GSAM 'database' (vsam file) would have the read pointer re-positioned to the last checkpoint.

This is quite complex code to write even in languages like Cobol/PLI so I am guessing it is not written in DS generated cobol. You must know the job failed and you must know how to issue a step restart and reposition yourself in the source 'database'.

I'd suggest you split the files up and process them in smaller batches and write you jobs so that they can be restarted from the beginning of the job. This is what I do on unix because it's too hard to save 'state' information and do a sensibly position restart.

Just my 2cents worth...

srikie · Post by **srikie** » Wed Jan 19, 2005 8:49 am

Hi peternolan9 ,
Even I thought so. But there should still be the way to atleast put the checkpoints cause it is the most important thing in whatever datastage version you are working for. I am not very particular about mvs I just want to know if it has it or not. Cause I think if there is any facility in one version it should be there in other too.
Thanks

Mike · Post by **Mike** » Wed Jan 19, 2005 10:45 am

Srikanth,

If you are using a relational stage as a target, you will need to manage when the data that you insert and/or update is committed to the database. For mainframe jobs, the commit interval is on the job properties page. By default, this is set to zero, which means that the data is only committed at the end of the job. This is OK for small volumes, but not for typical initial load volumes as you will likely exceed one or more DB2 resource limits (typically an ABEND with a -904 SQLCODE). A setting something like 10,000 might be more appropriate for initial load volumes.

If you have an audit column such as "last update date", then you can use this as a constraint to filter out already processed data (either in the CFF stage or a Transformer stage). I'm not sure if you can use a job parameter on the constraint page of the CFF stage (you'll have to experiment with that).

With version 7.x, there is a new business rule stage. You can issue database commits from within a business rule stage, so this is probably a more flexible way to control the commit interval than what you get on the job properties page. You can coordinate the database commit with saving the input record's key value or record number at the point of commit. This is the "status information" that I was thinking of previously. You need to know the record number or record key of the last record processed that was committed to the database. Of course, when you restart, you will need to position the process to the next record, and you will need to guarantee that the source has not changed and that you read the source in exactly the same sequence. You need to write the status information to a file that will ultimately end up in your job parameter file. I would write the status information into a flat file (make sure the JCL DISP is set to keep the file in the case of an ABEND). You will need a simple job to copy the last line of the status file into your job parameter file prior to restarting the failed job.

As Peter hinted, you might want to consider processing your data in smaller chunks. I would suggest creating a job to partition your VSAM file into X number of sequential files. You could then run X jobs to process these sequential files in parallel (or run them one at a time if you don't need to run in parallel).

Are you planning to read a source VSAM file and then do reference lookups to other VSAM files? There are more scalable and better performing alternatives to this. I can elaborate if you have this need.

Don't make the mistake of thinking that a mainframe job is anything like a server job just because they are both DataStage. There are far more differences than there are similarities.

Mike

srikie · Post by **srikie** » Wed Jan 19, 2005 1:46 pm

Hi Mike,
I want to know the other alternatives for the look up.
Thanks
Srikanth

Mike · Post by **Mike** » Wed Jan 19, 2005 6:29 pm

First option: Lookup stage. Make sure you choose the Hash lookup technique. This will create an in-memory hash lookup table. This will do your lookup at memory speed. Obviously, your lookup file will need to fit in memory. This will be dependent on the REGION size specified in your JCL. As a general rule of thumb: about 100,000 rows of relatively narrow records should be possible to preload into memory. If you can't preload your lookup file into memory, then don't use the lookup stage.

Second option: Join stage. Use the two file match join technique. This will require that both files be sorted by the join keys. You will probably have to dump either one or both of the VSAM files to flat files so they can be sorted (use the presort tab of the flat file stage to specify sort criteria). If your driver file has unique join keys, then you can specify a left outer join with the driver file as the "outer link" file. If your driver file does not have unique join keys (most likely since the unique keys will be in the lookup file), then you can specify a full outer join with the driver file as the "inner link" file. DataStage requires the "outer link" file to have unique join keys, and DataStage will ABEND if either of the source files is not sorted as required.

Both of these options have some "overhead", so your processing volumes need to be large enough to justify it.

Mike

peternolan9 · Post by **peternolan9** » Thu Jan 20, 2005 2:19 am

srikie wrote:Hi peternolan9 ,
Even I thought so. But there should still be the way to atleast put the checkpoints cause it is the most important thing in whatever datastage version you are working for. I am not very particular about mvs I just want to know if it has it or not. Cause I think if there is any facility in one version it should be there in other too.
Thanks

Hi Srikie,
DS on unix/win supports checkpoints in the output stage and I assume it also does this on MVS. However, this is completely useless for 'restartability' because the easiest way to do it is to checkpoint on the individual tables being updated.....but what if there are more than one table being updated? The 'normal' checkpoints will not cover it.

ASCL did release some presentation on how to checkpoint across multiple stages and this is much more necessary in RTI. I guess you can ask ASCL for the presentation on checkpointing...someone should know where it is...I never got a copy because I thought it was a was of time...

What I do is to write outputs to a file before doing the insert/update. That way, if the insertupdate fails I can re-run just that portion of the job to load it.....it's a pain in the neck because you have a flat file stage between producing the result and doing the load, and you only need it to save time in the event of failures.....lots of people (even here) tell me I'm crazy doing it....so call me crazy....

This way, the job runs and produces the file as one unit of work and the load is another unit of work.......you can re-start the job at the beginning if the transformation processing fails or just strip down the job to the load if the load fails (I use insert then update).....

That's the most reliable and stable solution I have found so far....I don't like the solution of dropping the file because if I am processing 10M rows and some indexspace gives out at 9,999,999 rows I don't want to re-run the entire job replacing the 9,999,999 rows.....it's an exaggeration but you get what I mean. Updates are terribly expensive, especially so in Oracle....

This does mean that if you want to protect yourself from failures for large updates you are way better off breaking up the input file and processing many small jobs rather than one large job....

Just my 2 cents worth....

srikie · Post by **srikie** » Thu Jan 20, 2005 8:00 am

Hi Mike and Peter,
Thanks and I think now I have an idea about how it goes. Your suggestion have been very valuable to me. Yeah, I am doing your way peter so even I will be joining the crazy ppl

. We first load into flat files and then load into db2.If there any more questions regarding this I hope you guys will help in the similar way.
Thanks
srikie

peternolan9 · Post by **peternolan9** » Fri Jan 21, 2005 10:01 am

srikie wrote:If there any more questions regarding this I hope you guys will help in the similar way.
Thanks
srikie

As much as we can Srikie, given a 168 hour week.....

DSXchange

checkpoint/restart safeguards

checkpoint/restart safeguards

vsam-source-> target:db2

Saving status information

Re: checkpoint/restart safeguards

Re: checkpoint/restart safeguards

Re: checkpoint/restart safeguards

Re: checkpoint/restart safeguards

Re: checkpoint/restart safeguards