Complex file reading

chandra4u · Post by **chandra4u** » Mon Dec 15, 2008 5:54 am

I would like to read the following file format in a datastage job. Could you please let me know how can I do this? what are the stages I can use?

1 XXXXXXXXXXXXX
OPAL Detailed xxxxxxxxx xxxxx Listing Period: 20030228 To 20030331
0 Policy No. Table No. Name(s) R/I Claim
--------- -------- ---------------------------------------- ---------
60067322 35A MR BJ SWANSTON 26316.00
60349714 35A MRS CA JACOBS 30000.00
60381376 35A MR J MARTINSON 20000.00
60425069 35A MRS J CHAPMAN 20000.00
60450640 35A MRS BJ MCNAMARA 20000.00
60452691 35A MR KR NICHOLLS 49000.00
60474853 35A MR VG WILLIAMS 20000.00
60489440 35A MR R CHAPPELL 47000.00
60514587 35A MRS L SMITH 55556.00
60535076 35A MISS L LEBEN 10000.00
60637795 35A MR K LANG 20000.00
60697227 35A MRS H BLAND 50000.00
End of Report
------------
Page Total 367872.00
------------
0 Report Total 367872.00
------------

chulett · Post by **chulett** » Mon Dec 15, 2008 7:40 am

You want to read a report, eh? You'll need to read it as one long varchar field, skipping and parsing it appropriately based on what you find in the first few characters of each record.

chandra4u · Post by **chandra4u** » Mon Dec 15, 2008 8:51 am

Thanks. But I am not able to parse it. Could you please give me a condition on parsing the policy numbers

chulett · Post by **chulett** » Mon Dec 15, 2008 9:20 am

Why not? What have you tried? Use stage variables to help, skip up to (inclusive) the first set of dashes and then process records until you read 'End of Report'.

chandra4u · Post by **chandra4u** » Mon Dec 15, 2008 10:53 am

Sorry I am new to to this...I tried reading as one string and using string functions in transformer trying to parse that. But not successful.

chulett · Post by **chulett** » Mon Dec 15, 2008 11:07 am

Ok, but not successful why? Are you able to identify the Policy rows?

Let's try this in a more "PX" fashion, shall we? Rather than the "one long string" approach, trying defining the sequential file metadata such that it matches only the Policy detail records. That should work because they look unique in the report - use the appropriate field delimiter, first field a six digit number, etc. When you read the file with that metadata, any rows that don't match that will automatically be discarded by the stage.

That or use a tool other than an ETL tool for this, one meant specifically to mine data from report files like you have: Monarch for example.

ray.wurlod · Post by **ray.wurlod** » Mon Dec 15, 2008 3:37 pm

Add a reject output link from the Sequential File stage; this will capture any row that does not match the record schema for a detail record, and downstream stages on this link can re-parse for example to retrieve the dates and totals.

Code: Select all

record schema
{
policy_no string[8];
filler1 string[1];
table_no string[3];
filler2 string[1];
remainder string[max=255];
}

chandra4u · Post by **chandra4u** » Tue Dec 16, 2008 12:06 pm

Thanks for the inputs. What I have done is read the rows using the space as seperater, using some function conditions in the transformer and in the end combining them back wherever required. Thanks again

ray.wurlod · Post by **ray.wurlod** » Tue Dec 16, 2008 12:56 pm

Not my preferred approach, as you have differing numbers of space characters in the names.