Complex file reading

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chandra4u
Participant
Posts: 13
Joined: Wed May 17, 2006 5:45 am
Location: Bangalore

Complex file reading

Post by chandra4u »

I would like to read the following file format in a datastage job. Could you please let me know how can I do this? what are the stages I can use?


1 XXXXXXXXXXXXX
OPAL Detailed xxxxxxxxx xxxxx Listing Period: 20030228 To 20030331
0 Policy No. Table No. Name(s) R/I Claim
--------- -------- ---------------------------------------- ---------
60067322 35A MR BJ SWANSTON 26316.00
60349714 35A MRS CA JACOBS 30000.00
60381376 35A MR J MARTINSON 20000.00
60425069 35A MRS J CHAPMAN 20000.00
60450640 35A MRS BJ MCNAMARA 20000.00
60452691 35A MR KR NICHOLLS 49000.00
60474853 35A MR VG WILLIAMS 20000.00
60489440 35A MR R CHAPPELL 47000.00
60514587 35A MRS L SMITH 55556.00
60535076 35A MISS L LEBEN 10000.00
60637795 35A MR K LANG 20000.00
60697227 35A MRS H BLAND 50000.00
End of Report
------------
Page Total 367872.00
------------
0 Report Total 367872.00
------------
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You want to read a report, eh? You'll need to read it as one long varchar field, skipping and parsing it appropriately based on what you find in the first few characters of each record.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chandra4u
Participant
Posts: 13
Joined: Wed May 17, 2006 5:45 am
Location: Bangalore

Post by chandra4u »

Thanks. But I am not able to parse it. Could you please give me a condition on parsing the policy numbers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Why not? What have you tried? Use stage variables to help, skip up to (inclusive) the first set of dashes and then process records until you read 'End of Report'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chandra4u
Participant
Posts: 13
Joined: Wed May 17, 2006 5:45 am
Location: Bangalore

Post by chandra4u »

Sorry I am new to to this...I tried reading as one string and using string functions in transformer trying to parse that. But not successful.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ok, but not successful why? Are you able to identify the Policy rows?

Let's try this in a more "PX" fashion, shall we? Rather than the "one long string" approach, trying defining the sequential file metadata such that it matches only the Policy detail records. That should work because they look unique in the report - use the appropriate field delimiter, first field a six digit number, etc. When you read the file with that metadata, any rows that don't match that will automatically be discarded by the stage.

That or use a tool other than an ETL tool for this, one meant specifically to mine data from report files like you have: Monarch for example.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Add a reject output link from the Sequential File stage; this will capture any row that does not match the record schema for a detail record, and downstream stages on this link can re-parse for example to retrieve the dates and totals.

Code: Select all

record schema
{
policy_no string[8];
filler1 string[1];
table_no string[3];
filler2 string[1];
remainder string[max=255];
}
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chandra4u
Participant
Posts: 13
Joined: Wed May 17, 2006 5:45 am
Location: Bangalore

Post by chandra4u »

Thanks for the inputs. What I have done is read the rows using the space as seperater, using some function conditions in the transformer and in the end combining them back wherever required. Thanks again
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not my preferred approach, as you have differing numbers of space characters in the names.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply