Header Record Validation in a fixed width file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chaithanya
Participant
Posts: 12
Joined: Fri Apr 12, 2013 7:12 am

Header Record Validation in a fixed width file

Post by chaithanya »

Hi

I have a fixed width file with header record and detail record. Header record layout is different from that of detail record.

For example, I have the data:

00Employee_Details2013-05-242 -------------->HeaderRecord
012012-01-01Ram29842013-05-24 23:23:45 --------------->Detail Records
012012-02-02Sita39842013-13-24 23:23:45

The header record has the layout as:
Headerrecord_type file_description Date_created record_count

The Detail record has the layout as:
Detailrecord_type join_date Name Emp_Id RecordCreate_date


I have to check if the header "Date_created" is equal to the detail "Record Create date" . Any idea on how to store the header record separately for processing?

Thanks
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,
Before processing a file,design file validation job.Use two external source stages,your job design will be like this:-

External source1---->Col Gen dummy-->Join On dummy----Transf---Copy

External Source2---->Col Gen Dummy--->

In External Source1 Stage Write below unix command:-

Code: Select all

head -1 File Name|cut -c19-27
In External Source2 Stage write below unix command:-

Code: Select all

head -2 File Name|tail -1|cut -c20-28


Inner join on dummy column.In transformer validate external src1 and external src2 output,if it is matching your file validation is successful otherwise failed.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why do you always post UNIX solutions on a DataStage forum? :?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Capture the date from the header into a stage variable. Compare this value with the value in detail records.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi Ray,
I saw they are running datastage on unix platform,thats why i suggested.
spoilt
Participant
Posts: 7
Joined: Mon Mar 25, 2013 7:17 am

Post by spoilt »

Stage 1:
Read First line by using SEQ file stage : [filtre : head -1]

Stage 2:
Read all lines except first in SEQ file stage : [filtre : sed '1d']

Lookup Stage to compare / drop non-matching records.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

prasson_ibm wrote:I saw they are running datastage on unix platform,thats why i suggested.
:idea: Why don't we start by suggesting DataStage solutions and only fall back to the O/S heavy stuff if there's no other choice or the "in job" solution is... unwieldy.
Last edited by chulett on Tue May 28, 2013 6:52 am, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Several ways to approach this.

Read the file's record as one long string and check the first two bytes for either "00" or "01" to determine header v. detail... or if there's only one header record per file you could use record #1 v. all other records. Store the substring'd header value in a stage variable and compare it to the detail records. Pass out any detail records where it doesn't match. Target records > 0 = bad file.
-craig

"You can never have too many knives" -- Logan Nine Fingers
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi Chullet,
Sure,you are saying is logical,thanks for your suggestion.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I suggest using Complex Flat File for the initial input. The first two bytes look to be the record type, and you can set up CFF with a separate output link for each type. You can then attempt a simple Join and have it abort on the date not matching. The advantage there is if you have some details with the correct date and you want the job to process them, you also have that option.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
suja.somu
Participant
Posts: 79
Joined: Thu Feb 07, 2013 10:51 pm

Post by suja.somu »

For this scenario, below job design works fine. the properties are in verison 8.5.

seq file stage1---->Col Gen dummy-->Join On dummy(inner join) ---->Transf---->(seq file output stage1,seq file output stage2)
seq file stage2---->Col Gen Dummy--->


Seq file stage1 : to read the detail records ( specify detail record layout)
Read all detail records except the header by using SEQ file stage ( set property: first line in column names= True)

Seq file stage2 : to read the header record ( specify header record layout)
Read only the first line by using SEQ file stage ( set property: read first rows =1 and first line in column names= False)

In both the flows , in column generator ,create a new column called dummy and set the properties to Type= cycle, initial value = 1 increment =0

In transformer create a stage var with a derivation to compare the date fields.

stagevardate: IF ( Lnk_Match_date.Date_created = Lnk_Match_date.Record Create date ) THEN 'MATCH' ELSE 'NOMATCH'

set the constraint in the transformer to get the stagevardate = "MATCH"
and another link constraint to stagevardate = "NOMATCH"

to get matching records in 1 file and not matching records to the another sequential files ouput
chaithanya
Participant
Posts: 12
Joined: Fri Apr 12, 2013 7:12 am

Post by chaithanya »

Thanks suja.somu
It worked. The only problem is, I read the input fixed width data as Varchar ,so the job aborted. Then i replaced it with Char datatype and the problem resolved :D
Post Reply