Page 1 of 1

DataStage vs. COBOL performance benchmarks

Posted: Mon Dec 06, 2010 4:08 pm
by ptc3bluedevil
I'm doing a high-level plan for a historical data conversion from mainframe to DB2. I have used DataStage in the past, but my current client has often used custom COBOL code to perform conversions. I want to outline some pros and cons for my client before a decision is made.

Some of the common disadvantages to COBOL do not apply here. The client has a sufficient number of COBOL programmers to perform a conversion. It is a one-time, historical conversion, so maintainability over the years is not a huge concern. On the other hand, other divisions of the organization use DataStage, so it is an option.

I believe that performance at high volumes should be an important factor in the decision. Some of the datasets to be converted will contain > 100 million records. Does anyone have performance benchmarks for loading to DB2 using COBOL code vs. DataStage? Or does IBM have some information available?

Posted: Mon Dec 06, 2010 4:12 pm
by ray.wurlod
Don't have any benchmarks but my experience suggests that well-written COBOL would probably beat DataStage fairly easily. So the question boils down to just how good these COBOL programmers are, particularly with respect to accessing (remote?) DB2 databases.

Posted: Mon Dec 06, 2010 4:52 pm
by stuartjvnorton
IBM have a product called Gladstone (sp?) that IIRC is at heart a COBOL code generator.
So they might just have the sort of benchmarks you need.

Posted: Mon Dec 06, 2010 11:20 pm
by ray.wurlod
IBM have a product called DataStage (mainframe edition) that can generate good quality COBOL as well as the JCL to compile and run it.

Posted: Tue Dec 07, 2010 11:28 am
by FranklinE
The phrase "mainframe to DB2" leaves me a bit confused. Is this sequential files to IBM DB2 v9? Are they ISAM or VSAM files? And the biggest question of all, for me: What is your maintenance interface for DB2? Is it BMC, or something else?

Nothing beats the bulk unload/load utilities for performance with DB2 tables. So unless you can clarify to show my error, you really need to consider this third option.

My not-so humble opinion about the code base is that there is no advantage either way. I don't know anything about the run-time environment for DS in mainframe, but if it's at all similar to Unix, I would choose COBOL, and use it to prepare the data for a formatted bulk load. If you don't have the bulk option, then I must defer to those with DS-mainframe experience.

Posted: Tue Dec 07, 2010 5:50 pm
by vmcburney
Because this is a migration I would expect code written by the legacy team who already have the knowledge of the data and libraries to extract it will be faster at getting the data out of the legacy database so Cobol would be the way to go. In terms of loading to target (where we have incomplete information) you may be able to accelerate it via DataStage - especially if you can use Cobol copybooks to read complex flat files and deliver it to relational DB2 tables. This extract team would identify filtering and archiving and deliver the data in a format you can use.

Any method you use will probably load the data to the target DB2 in the same way - bulk loads. DataStage may handle the intermediate transformation and parallelism of those loads more efficiently and can synchronise the partitioning of data transformation to match the partitioning of the target DB2 tables. I would use Cognos to get the data out of the legacy system, I would use scripts to load it to the target if there is minimal data quality cleansing and transformation or use DataStage if there is a lot.