Header Record Validation in a fixed width file

chaithanya · Post by **chaithanya** » Mon May 27, 2013 11:36 pm

Hi

I have a fixed width file with header record and detail record. Header record layout is different from that of detail record.

For example, I have the data:

00Employee_Details2013-05-242 -------------->HeaderRecord
012012-01-01Ram29842013-05-24 23:23:45 --------------->Detail Records
012012-02-02Sita39842013-13-24 23:23:45

The header record has the layout as:
Headerrecord_type file_description Date_created record_count

The Detail record has the layout as:
Detailrecord_type join_date Name Emp_Id RecordCreate_date

I have to check if the header "Date_created" is equal to the detail "Record Create date" . Any idea on how to store the header record separately for processing?

Thanks

prasson_ibm · Post by **prasson_ibm** » Tue May 28, 2013 1:13 am

Hi,
Before processing a file,design file validation job.Use two external source stages,your job design will be like this:-

External source1---->Col Gen dummy-->Join On dummy----Transf---Copy

External Source2---->Col Gen Dummy--->

In External Source1 Stage Write below unix command:-

Code: Select all

head -1 File Name|cut -c19-27

In External Source2 Stage write below unix command:-

Code: Select all

head -2 File Name|tail -1|cut -c20-28

Inner join on dummy column.In transformer validate external src1 and external src2 output,if it is matching your file validation is successful otherwise failed.

ray.wurlod · Post by **ray.wurlod** » Tue May 28, 2013 2:52 am

Why do you always post UNIX solutions on a DataStage forum?

ray.wurlod · Post by **ray.wurlod** » Tue May 28, 2013 2:53 am

Capture the date from the header into a stage variable. Compare this value with the value in detail records.

prasson_ibm · Post by **prasson_ibm** » Tue May 28, 2013 3:20 am

Hi Ray,
I saw they are running datastage on unix platform,thats why i suggested.

spoilt · Post by **spoilt** » Tue May 28, 2013 3:48 am

Stage 1:
Read First line by using SEQ file stage : [filtre : head -1]

Stage 2:
Read all lines except first in SEQ file stage : [filtre : sed '1d']

Lookup Stage to compare / drop non-matching records.

chulett · Post by **chulett** » Tue May 28, 2013 6:44 am

prasson_ibm wrote:I saw they are running datastage on unix platform,thats why i suggested.

Why don't we start by suggesting DataStage solutions and only fall back to the O/S heavy stuff if there's no other choice or the "in job" solution is... unwieldy.

chulett · Post by **chulett** » Tue May 28, 2013 6:52 am

Several ways to approach this.

Read the file's record as one long string and check the first two bytes for either "00" or "01" to determine header v. detail... or if there's only one header record per file you could use record #1 v. all other records. Store the substring'd header value in a stage variable and compare it to the detail records. Pass out any detail records where it doesn't match. Target records > 0 = bad file.

prasson_ibm · Post by **prasson_ibm** » Tue May 28, 2013 3:48 pm

Hi Chullet,
Sure,you are saying is logical,thanks for your suggestion.

FranklinE · Post by **FranklinE** » Wed May 29, 2013 10:09 am

I suggest using Complex Flat File for the initial input. The first two bytes look to be the record type, and you can set up CFF with a separate output link for each type. You can then attempt a simple Join and have it abort on the date not matching. The advantage there is if you have some details with the correct date and you want the job to process them, you also have that option.

suja.somu · Post by **suja.somu** » Wed May 29, 2013 4:10 pm

For this scenario, below job design works fine. the properties are in verison 8.5.

seq file stage1---->Col Gen dummy-->Join On dummy(inner join) ---->Transf---->(seq file output stage1,seq file output stage2)
seq file stage2---->Col Gen Dummy--->

Seq file stage1 : to read the detail records ( specify detail record layout)
Read all detail records except the header by using SEQ file stage ( set property: first line in column names= True)

Seq file stage2 : to read the header record ( specify header record layout)
Read only the first line by using SEQ file stage ( set property: read first rows =1 and first line in column names= False)

In both the flows , in column generator ,create a new column called dummy and set the properties to Type= cycle, initial value = 1 increment =0

In transformer create a stage var with a derivation to compare the date fields.

stagevardate: IF ( Lnk_Match_date.Date_created = Lnk_Match_date.Record Create date ) THEN 'MATCH' ELSE 'NOMATCH'

set the constraint in the transformer to get the stagevardate = "MATCH"
and another link constraint to stagevardate = "NOMATCH"

to get matching records in 1 file and not matching records to the another sequential files ouput

chaithanya · Post by **chaithanya** » Sat Jun 01, 2013 12:39 am

Thanks suja.somu
It worked. The only problem is, I read the input fixed width data as Varchar ,so the job aborted. Then i replaced it with Char datatype and the problem resolved :D