Page 1 of 1

Complex file reading

Posted: Mon Dec 15, 2008 5:54 am
by chandra4u
I would like to read the following file format in a datastage job. Could you please let me know how can I do this? what are the stages I can use?


1 XXXXXXXXXXXXX
OPAL Detailed xxxxxxxxx xxxxx Listing Period: 20030228 To 20030331
0 Policy No. Table No. Name(s) R/I Claim
--------- -------- ---------------------------------------- ---------
60067322 35A MR BJ SWANSTON 26316.00
60349714 35A MRS CA JACOBS 30000.00
60381376 35A MR J MARTINSON 20000.00
60425069 35A MRS J CHAPMAN 20000.00
60450640 35A MRS BJ MCNAMARA 20000.00
60452691 35A MR KR NICHOLLS 49000.00
60474853 35A MR VG WILLIAMS 20000.00
60489440 35A MR R CHAPPELL 47000.00
60514587 35A MRS L SMITH 55556.00
60535076 35A MISS L LEBEN 10000.00
60637795 35A MR K LANG 20000.00
60697227 35A MRS H BLAND 50000.00
End of Report
------------
Page Total 367872.00
------------
0 Report Total 367872.00
------------

Posted: Mon Dec 15, 2008 7:40 am
by chulett
You want to read a report, eh? You'll need to read it as one long varchar field, skipping and parsing it appropriately based on what you find in the first few characters of each record.

Posted: Mon Dec 15, 2008 8:51 am
by chandra4u
Thanks. But I am not able to parse it. Could you please give me a condition on parsing the policy numbers

Posted: Mon Dec 15, 2008 9:20 am
by chulett
Why not? What have you tried? Use stage variables to help, skip up to (inclusive) the first set of dashes and then process records until you read 'End of Report'.

Posted: Mon Dec 15, 2008 10:53 am
by chandra4u
Sorry I am new to to this...I tried reading as one string and using string functions in transformer trying to parse that. But not successful.

Posted: Mon Dec 15, 2008 11:07 am
by chulett
Ok, but not successful why? Are you able to identify the Policy rows?

Let's try this in a more "PX" fashion, shall we? Rather than the "one long string" approach, trying defining the sequential file metadata such that it matches only the Policy detail records. That should work because they look unique in the report - use the appropriate field delimiter, first field a six digit number, etc. When you read the file with that metadata, any rows that don't match that will automatically be discarded by the stage.

That or use a tool other than an ETL tool for this, one meant specifically to mine data from report files like you have: Monarch for example.

Posted: Mon Dec 15, 2008 3:37 pm
by ray.wurlod
Add a reject output link from the Sequential File stage; this will capture any row that does not match the record schema for a detail record, and downstream stages on this link can re-parse for example to retrieve the dates and totals.

Code: Select all

record schema
{
policy_no string[8];
filler1 string[1];
table_no string[3];
filler2 string[1];
remainder string[max=255];
}

Posted: Tue Dec 16, 2008 12:06 pm
by chandra4u
Thanks for the inputs. What I have done is read the rows using the space as seperater, using some function conditions in the transformer and in the end combining them back wherever required. Thanks again

Posted: Tue Dec 16, 2008 12:56 pm
by ray.wurlod
Not my preferred approach, as you have differing numbers of space characters in the names.