Automated Regression Testing

bmsq · Post by **bmsq** » Mon Jan 22, 2007 7:43 pm

Hi Everyone,

My team is fast approaching the testing phase and I've been trying to come up with an approach which allows easy regression testing.

At the moment we give each ETL job a small amount of test data and then manually verify the output. Once we are happy with this, the volume is increase and random records are verified.

This approach is fine except for regression testing. By rights, if we give our ETL job the same input data we should get the same results. I was thinking about saving the input and output of a job the first time we manually verify the output. Once this is done, we could just re-run the job with the previous input and compare the output with what we saved previously. This could be done using RCP and a generic DataStage job.

However, I'm not sure how you could go about saving the initial output. Most of our jobs use dataSets but you can't just copy the .ds files. I've tried using generic DS->Seq File and Seq File -> DS to use seq files as a storage medium, but I've run into problems with how to store null dates. I also don't know how to make a copy of a file lookup set.

Any thoughts on how to make this approach work? Or what about alternatives?

Thanks in advance

kumar_s · Post by **kumar_s** » Mon Jan 22, 2007 7:59 pm

You can use command line functions like orchestrate with dump option to convert dataset to sequential file.
Fileset can be concatinated together based on the control file.
Not sure, how RCP helped you to save files.

DSguru2B · Post by **DSguru2B** » Mon Jan 22, 2007 8:07 pm

Do a unix level diff on the two files.

ray.wurlod · Post by **ray.wurlod** » Mon Jan 22, 2007 9:19 pm

You CAN copy the Data Sets, even though just copying the ".ds" file isn't how it's done. You can use the Data Set Management GUI (available from Designer, Manager or Director Tools menu) or the orchadmin command from the command line.

bmsq · Post by **bmsq** » Tue Jan 23, 2007 12:04 am

Thanks for all your responses.

What I was trying to create was some form of test data repository which may exist on a server other than where DataStage resides (which rules out the copy from the DataManager).

I initially created a generic job which was DataSet -> Seq file, this job took the file names as parameters and did not define any meta data (it used RCP). This was wrapped in a sequence which first used dsadim to extract the schema of the DataSet and then executed the generic DS->SF job. This allowed my to store the Seq File & Schema anywhere I wanted in a test data repository. When I wanted to restore it, I would use another generic job which was SF -> DataSet. Once again, the file names where parameters but this time the schema file was used to define the SF metadata.

This seemed to work great (maintaining partitioning didn't matter), until I was dealing with Nullable dates. In this situation, I set the DS->SF job to create an empty string for all nulls. Unfortunately, this doesn't seem to work correctly when going the other way (SF->DS). The SF seems to stumble on the null dates saying it wasn't the correct format. I've had this problem previously, but I got around it my specifying a default value of Null for each data field.

The plan was to capture both input & output of a job when we know that it was functioning correctly. When we want to regression test, the input could be restored before the job is executed. After job execution is complete, the output could be compared to what is in the repository. Due to the volumes, I was hoping to use a change capture stage within another generic DataStage job.

Is this not possible? Any suggestions for making this work or alternative solutions to regression testing would be greatly appreciated.

Thanks in advance

kumar_s · Post by **kumar_s** » Tue Jan 23, 2007 4:37 am

Are you sure that, you are not using any metadata for source stage either

Do you just use one job for any sort of file with RPC enabled?

bmsq · Post by **bmsq** » Tue Jan 23, 2007 4:38 pm

For both generic jobs, I don't explicitly specify any Column data. Instead, the DS -> SF job seems to be able to use RCP with the meta data in the DS and the SF results in all the data from the DS appearing correctly.

The SF -> DS stage however does not already contain any meta data, so I specify this using a schema file (it's location is parameterised). This schema file is generated using the dsadmin util when the SF is first generated.

This all seems to work fine except the SF->DS when the SF contains null dates. When this happens it complains that a null date, specified as "" is not a valid date formate which still happens even if I set the Null Field value to "".

If I could fix this, I would have a generic framework for performing automated regression testing.

Although I keep repeating my current approach, all I'm really after is an easy way to perform regression. Currently we have a large amount of DS jobs (50+) that we have been manually testing as we built them. Since we are approaching our formal testing stage, regression testing is going to be needed and I'd like to avoid re-performing manual testing every time we fix any defects.

Thanks for all your replies so far, I look forward to reading any further responses.

Barry