Different format in a flat file???

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Assuming you have some way to recognize the record types, the typical way to do this is to define your flat file as a single string, or maybe one "id" field and the rest as "data". Then in your initial transform, split out the header, detail and trailer records, and split up the data portion accordingly.

You can also play games with the aggregator, using FIRST and LAST to find the header and trailer records.

-craig
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
I implemented something like what craig sugested on DS vr.5.
worked nicly except 1 glitch: I got phantom warnings.
got rid of phntoms eventually by having 2 transformers after the initial row filtering and only then ouput to a file.
if you want I can elaborate on what I did.

Good Luck,


Roy R.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

srsr

If you are on UNIX the fastest way is to strip off the first and last line with UNIX command sed. I would have to look up the syntax but it is simple because "$" is the last line. If you need the exact command the let me know.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If your file is small (small to me is < 50million) then simply do a head and tail combination on the file into another file. Write a standard ksh script so you can reuse this script on all files of similar structure. If header and trailer includes row counts and byte counts then expand this script to validate the data. This is classic data warehousing, and you need to develop your code library with scripts such as this. This is referred to as "induction processing".

DataStage's Sequential stage works with fixed format files. Do not attempt to use varying format. You could use the "one column trick" to define one column of data in the file and using field extractions or byte location extraction to parse it to various columns. However, this throws metadata right into the bit bucket.

Kenneth Bland
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Ken

I thought about that but head has a limit to the number of lines it will process on most UNIX boxes this is a small number like several thousand lines. I have not found a limit on tail.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Also, this poster didn't tell us PX or Server or version. Release 6.0 has filtering, so you could issue head/tail if you wanted. However, the trailer record gives you problems. I still like my original recommendation, and you should probably use SED. I just didn't know the abilities of the poster.

This is one of those things we all do at every client. This is why a bag of tricks is a consultants best friend. I actually have my version of a validation script, but people out there are better off developing their own toolset as part of their project. Should ETL tools be expanded for this functionality? I think so. Headers and trailers are SOOOOOO common, what's different is the format and purpose.

Good luck all!

Kenneth Bland
degraciavg
Premium Member
Premium Member
Posts: 39
Joined: Tue May 20, 2003 3:36 am
Location: Singapore

Post by degraciavg »

Hi srsr,

Here's a nice trick you can do with sequential files having headers and footers... or with different record layouts, in general...

1) Normally, the header records and footer records will not have as many columns as the file details records. In the sequential file stage, while reading the file, on the Format tab, set Default Padding to 'NULL' . On the Columns grid, the 'Incomplete Columns' action should be 'Replace'. This will allow you to read your sequential file with different record layouts.

2) In the transformer, set a constraint for records that have SQL nulls in columns that are actually NOT NULLABLE in the source. This will distinguish your Header, Data, and Footer records.
Eg. NOT(isNULL(input.field3)) could be the constraint if we know field3 is not NULLABLE.

Note: the constraint will depend on your source data.

Hope this helps...[:)]

Vladimir
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I implemented just what Vladimir suggested some time ago on a 5.2 system, where the sequential file was multi-GB (too big, in the client's opinion, to consider pre-processing). They were totally happy with this solution, as it required only a single pass through the file.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Post Reply