Different format in a flat file???

chulett · Post by **chulett** » Mon Aug 04, 2003 7:28 am

Assuming you have some way to recognize the record types, the typical way to do this is to define your flat file as a single string, or maybe one "id" field and the rest as "data". Then in your initial transform, split out the header, detail and trailer records, and split up the data portion accordingly.

You can also play games with the aggregator, using FIRST and LAST to find the header and trailer records.

-craig

roy · Post by **roy** » Mon Aug 04, 2003 7:55 am

Hi,
I implemented something like what craig sugested on DS vr.5.
worked nicly except 1 glitch: I got phantom warnings.
got rid of phntoms eventually by having 2 transformers after the initial row filtering and only then ouput to a file.
if you want I can elaborate on what I did.

Good Luck,

Roy R.

kduke · Post by **kduke** » Mon Aug 04, 2003 8:49 am

srsr

If you are on UNIX the fastest way is to strip off the first and last line with UNIX command sed. I would have to look up the syntax but it is simple because "$" is the last line. If you need the exact command the let me know.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

kcbland · Post by **kcbland** » Mon Aug 04, 2003 9:10 am

If your file is small (small to me is < 50million) then simply do a head and tail combination on the file into another file. Write a standard ksh script so you can reuse this script on all files of similar structure. If header and trailer includes row counts and byte counts then expand this script to validate the data. This is classic data warehousing, and you need to develop your code library with scripts such as this. This is referred to as "induction processing".

DataStage's Sequential stage works with fixed format files. Do not attempt to use varying format. You could use the "one column trick" to define one column of data in the file and using field extractions or byte location extraction to parse it to various columns. However, this throws metadata right into the bit bucket.

Kenneth Bland

kduke · Post by **kduke** » Mon Aug 04, 2003 9:40 am

Ken

I thought about that but head has a limit to the number of lines it will process on most UNIX boxes this is a small number like several thousand lines. I have not found a limit on tail.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

kcbland · Post by **kcbland** » Mon Aug 04, 2003 10:54 am

Also, this poster didn't tell us PX or Server or version. Release 6.0 has filtering, so you could issue head/tail if you wanted. However, the trailer record gives you problems. I still like my original recommendation, and you should probably use SED. I just didn't know the abilities of the poster.

This is one of those things we all do at every client. This is why a bag of tricks is a consultants best friend. I actually have my version of a validation script, but people out there are better off developing their own toolset as part of their project. Should ETL tools be expanded for this functionality? I think so. Headers and trailers are SOOOOOO common, what's different is the format and purpose.

Good luck all!

Kenneth Bland

degraciavg · Post by **degraciavg** » Mon Aug 04, 2003 11:05 pm

Hi srsr,

Here's a nice trick you can do with sequential files having headers and footers... or with different record layouts, in general...

1) Normally, the header records and footer records will not have as many columns as the file details records. In the sequential file stage, while reading the file, on the Format tab, set Default Padding to 'NULL' . On the Columns grid, the 'Incomplete Columns' action should be 'Replace'. This will allow you to read your sequential file with different record layouts.

2) In the transformer, set a constraint for records that have SQL nulls in columns that are actually NOT NULLABLE in the source. This will distinguish your Header, Data, and Footer records.
Eg. NOT(isNULL(input.field3)) could be the constraint if we know field3 is not NULLABLE.

Note: the constraint will depend on your source data.

Hope this helps...[:)]

Vladimir

ray.wurlod · Post by **ray.wurlod** » Tue Aug 05, 2003 12:42 am

I implemented just what Vladimir suggested some time ago on a 5.2 system, where the sequential file was multi-GB (too big, in the client's opinion, to consider pre-processing). They were totally happy with this solution, as it required only a single pass through the file.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518