File processing question

mydsworld · Post by **mydsworld** » Tue Jan 13, 2009 9:55 am

I have a file with a No. of records and a control record (containing count of records) at the end. I would like to check that No. of records actually match the control record count, then only I will take the file for further processing. What will be the most efficient design for matching the counts.

Thanks.

chulett · Post by **chulett** » Tue Jan 13, 2009 10:06 am

Probably a UNIX script before you ever get into a job. Tail off the control record, cut out the number and compare it to a count of lines (minus 1 or 2) from the file itself. (minus 2 if there's a header record with column names)

mydsworld · Post by **mydsworld** » Tue Jan 13, 2009 10:21 am

Any suggestion if I had to do it in DataStage

chulett · Post by **chulett** » Tue Jan 13, 2009 10:39 am

Thought you only wanted to 'process' the file if the counts matched?

Would two jobs work, one to validate counts only and one to process it if the counts are ok?

mydsworld · Post by **mydsworld** » Tue Jan 13, 2009 11:05 am

Two jobs would work. Please let me know how you plan to validate the count in first job.

ray.wurlod · Post by **ray.wurlod** » Tue Jan 13, 2009 3:08 pm

Lots of possible ways. A server job using an Aggregator, a server job using a Transformer stage reporting @INROWNUM into a hashed file with a constant key, an Execute Command activity executing a wc -l command are among them.

kandyshandy · Post by **kandyshandy** » Tue Jan 13, 2009 3:55 pm

mydsworld,
In general, this kind of files will have some kind of identification of header record, detail record and footer record. Do you have something like shown below?

0$filename
1$detailrecord
1$detailrecord
...
...
2$footerrecord

where record starting with 0 is header, 1 is detail and 2 is footer. Basically, you need to keep the count increasing while reading detail records and compare the count with footer record's count. Ray has given some ideas too.

Just as a FYI. I used this kind of files in the past where we check the files and load a table with a flag whether the file meets the expectations to be processed. Then we read the table and then process the files whose entry has flag Y in that table. This also helped us to troubleshoot.