Automated Regression Testing

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bmsq
Premium Member
Premium Member
Posts: 32
Joined: Mon Oct 30, 2006 9:19 pm

Automated Regression Testing

Post by bmsq »

Hi Everyone,

My team is fast approaching the testing phase and I've been trying to come up with an approach which allows easy regression testing.

At the moment we give each ETL job a small amount of test data and then manually verify the output. Once we are happy with this, the volume is increase and random records are verified.

This approach is fine except for regression testing. By rights, if we give our ETL job the same input data we should get the same results. I was thinking about saving the input and output of a job the first time we manually verify the output. Once this is done, we could just re-run the job with the previous input and compare the output with what we saved previously. This could be done using RCP and a generic DataStage job.

However, I'm not sure how you could go about saving the initial output. Most of our jobs use dataSets but you can't just copy the .ds files. I've tried using generic DS->Seq File and Seq File -> DS to use seq files as a storage medium, but I've run into problems with how to store null dates. I also don't know how to make a copy of a file lookup set.

Any thoughts on how to make this approach work? Or what about alternatives?

Thanks in advance
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

You can use command line functions like orchestrate with dump option to convert dataset to sequential file.
Fileset can be concatinated together based on the control file.
Not sure, how RCP helped you to save files.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Do a unix level diff on the two files.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You CAN copy the Data Sets, even though just copying the ".ds" file isn't how it's done. You can use the Data Set Management GUI (available from Designer, Manager or Director Tools menu) or the orchadmin command from the command line.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bmsq
Premium Member
Premium Member
Posts: 32
Joined: Mon Oct 30, 2006 9:19 pm

Post by bmsq »

Thanks for all your responses.

What I was trying to create was some form of test data repository which may exist on a server other than where DataStage resides (which rules out the copy from the DataManager).

I initially created a generic job which was DataSet -> Seq file, this job took the file names as parameters and did not define any meta data (it used RCP). This was wrapped in a sequence which first used dsadim to extract the schema of the DataSet and then executed the generic DS->SF job. This allowed my to store the Seq File & Schema anywhere I wanted in a test data repository. When I wanted to restore it, I would use another generic job which was SF -> DataSet. Once again, the file names where parameters but this time the schema file was used to define the SF metadata.

This seemed to work great (maintaining partitioning didn't matter), until I was dealing with Nullable dates. In this situation, I set the DS->SF job to create an empty string for all nulls. Unfortunately, this doesn't seem to work correctly when going the other way (SF->DS). The SF seems to stumble on the null dates saying it wasn't the correct format. I've had this problem previously, but I got around it my specifying a default value of Null for each data field.

The plan was to capture both input & output of a job when we know that it was functioning correctly. When we want to regression test, the input could be restored before the job is executed. After job execution is complete, the output could be compared to what is in the repository. Due to the volumes, I was hoping to use a change capture stage within another generic DataStage job.

Is this not possible? Any suggestions for making this work or alternative solutions to regression testing would be greatly appreciated.

Thanks in advance
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Are you sure that, you are not using any metadata for source stage either :?: :!:
Do you just use one job for any sort of file with RPC enabled?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
bmsq
Premium Member
Premium Member
Posts: 32
Joined: Mon Oct 30, 2006 9:19 pm

Post by bmsq »

For both generic jobs, I don't explicitly specify any Column data. Instead, the DS -> SF job seems to be able to use RCP with the meta data in the DS and the SF results in all the data from the DS appearing correctly.

The SF -> DS stage however does not already contain any meta data, so I specify this using a schema file (it's location is parameterised). This schema file is generated using the dsadmin util when the SF is first generated.

This all seems to work fine except the SF->DS when the SF contains null dates. When this happens it complains that a null date, specified as "" is not a valid date formate which still happens even if I set the Null Field value to "".

If I could fix this, I would have a generic framework for performing automated regression testing.

Although I keep repeating my current approach, all I'm really after is an easy way to perform regression. Currently we have a large amount of DS jobs (50+) that we have been manually testing as we built them. Since we are approaching our formal testing stage, regression testing is going to be needed and I'd like to avoid re-performing manual testing every time we fix any defects.

Thanks for all your replies so far, I look forward to reading any further responses.

Barry
Post Reply