reading file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kvsudheer
Premium Member
Premium Member
Posts: 20
Joined: Fri Aug 18, 2006 12:01 am
Location: India

reading file

Post by kvsudheer »

Hi All,

We have records in the following format. The records are of fixed width.
The record length is 160.sample records are shown below(only part of record is given below for example)

rec1---ABC000t764k6878m456
rec2---ABC001545kjh879j8543
rec3---ABC002675jh789ty8789
rec4---ABC000789kjh789kh324
rec5---ABC001123hj68kj08jh99

First 6 characters is the record type.
If the record type ends with 0 then 160 charcters should be loaded into 20 fileds in table X.
If the record type ends with 1 then 160 charcters should be loaded into 20 fileds which are completely different from rectype 1 fileds in the same table X
Similiarly with rectype ending with 2.

To be precise, if the record type is 1 then 7 to 13 characters is account num
if the record type is 1 then 7 to 10 characters is identification number and 11 to 13 is item number
likewise it is different for record type 2.

I request the members to help me in this regard
ersunnys
Participant
Posts: 29
Joined: Wed Sep 13, 2006 1:39 pm
Location: Singapore

Post by ersunnys »

Hi Sudheer,

You have number of options to achieve this...

Scenario 1 ::
1. Read the source file using sequential file stage with First 6 Char as Col1 and rest all data in Col2.

Option 1 :
Job 1 :: Split into physical files on bases of value of Col1
Next Job :: Read files with Sequential File stage with specific metadata of record type

Option 2 :
Split into different links on bases of value of Col1 and use Export Column stage to read the metadata for each link specific to record type

Scenario 2 ::
Read the same file in multiple Jobs (or one Job with Schema File and Run Time Column Propagation) with filter option as grep ^(RecordType) and provide the metadata specific to that Record Type...


Now you have to finalize which approach you want to use depending upon volume of data and other factors.
Regards,
Sunny Sharma.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Sudheer,

All of the record types load into the same table?

If all go to the same table, and assuming there is no duplication between record types (that is they don't need to load to the same row in the table), there are two easy options:

1) A single transformer with a single output link with all table columns. The derivations for the columns would check which record type and populate accordingly (record data else Null/Default value).

2) A single transformer with an output link for each record type, with all columns on each link and constrained by the record type. For each record type, the derivations for the columns type would pull from the source record, the other columns would code the appropriate default value. Follow this transformer with a funnel to put all records back into a single stream.

Sunny's Option 2 is functionally similar to #2 above.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
kvsudheer
Premium Member
Premium Member
Posts: 20
Joined: Fri Aug 18, 2006 12:01 am
Location: India

Post by kvsudheer »

@Sunny and @jwiles-- thanks for the reply.

if i follow the method suggested by you, i would end up with one record in the target table for each record type. I mean if there are 10 records with 3 record types, i would end up in creating 30 records.

But my scenario is something different, all the record types of a particular record should go in to the single row into the table.

As said in the above example, if there are 10 records with 3 record types, i should create 10 records in the table but not 30 records.

Please correct me if i am wrong.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Well, you simply said that the different record types will load their data into different fields in the table. No matter...

What you're needing to do here is a pivot (horizontal I believe). Being on DS 7x, all you have available as far as a pivot stage is the one from DS Server (8x has an Enterprise Pivot stage that is truly parallel).

I'm assuming there is a common identifier column (i.e. a key) on each record type. Are all record types always present for every key, or will you sometimes be missing a record type?

1) Read the records parsing out your record type, record key and the remainder of the record into a third column
2) Sort on record key and record type to ensure they're group together correctly

3) Two options:
a) If all record types are always present for a key, use the Pivot stage to combine the types into one record (they will then have a common layout), then follow this with a Column Importer to break out the final columns for the table.

b) If some record types may be missing for a key, use a transformer to manually build up the final columns in stage variables as a key value is processed. In the above sort, create a keyChange column. Logic for a stage variable svTableColumnX:

Code: Select all

If RecordType = desired_record_type then get_record_data
   else if keyChange=1 and RecordType <> desired_record_type then set_default_value
   else svTableColumnX
.
This should parse the data when a record type is encountered, default it when the key changes (don't want it output for other key values) else propagate the parsed or default value as long as the key remains the same. The final output record for a key should contain all data from the various record types encountered. Follow the transformer with a remove dupes stage, keeping the last record.

Hopefully that makes some sense! :)

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply