Joining multiple sequential files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ds_user78
Participant
Posts: 23
Joined: Thu Nov 11, 2004 5:39 pm

Joining multiple sequential files

Post by ds_user78 »

Hi,
I have around 10 to 15 sequential files to be joined into a single file. let us take it as mara.txt, marc.txt, mard.txt, mvke.txt etc.. The keys and the lay out of these files are different. How ever there is one common field among all these files (matnr) which is part of the key. For example mara.txt has matnr as the key marc.txt has matnr,plant as the key, mvke.txt has matnr,vkorg,vtweg as the key. Also it is possible that for a particular entry in mara.txt there may not be an entry in the mvke.txt file. So I would need to do an outer join effectively. If i want to join all these files and produce one output file in which the combination of all the keys in each of the file will be the key, what are my options?

1. Use merge stage to join two files at a time. The draw back seems to be that since I have huge number of records in each of these files each time I combine two file i will create one file with even more number of records occupying more space.

2. Use a staging database where i will pump these file, use the database joins effectively to create the required output file.

3. some how (ab)use the hash file to to do left outer joins - i am running into problems in this approach because the files which will be the driver table has a key which is a part of the key of reference table.

Which one of this approaches should I be taking?

Or Should I ask the SAP R/3 guys to write ABAP code to generate the file inside R/3 itself. My problem arises from the fact that since Ascential can connect to R/3 and generate ABAP code, they think that I can select all the fileds and generate the 'outer' joins too. Unfortunately the version of SAP that I am working with (3.1) did not have outer joins and hence I am forced to extract the tables individually.

Any ideas??? :(
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

I will go for the option to store all extracted information in a 'staging' database and run User Defined SQL to obtain the result. This is a clean way than to mess around with many hash-files and encounter maintenance issues.

Also any check in the data can be done using straight-forward SQL.

Another benefit is that we need not worry about defining a key to the tables as in the case of hash-files - which also includes an extra step of keeping the keys unique.

Doing that in SAP will imply changing ABAP codes via SAP Pack. If there are people to maintain it and assure its accuracy and quality, that may be a better option to consider.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

If you don't want to muck around with a lot of hash files you can do the entire merge with a single hash file. Create a hash file with a key field and 15 long text fields. Use a job that reads in a sequential file as a single long text field, splits the field into a key field and a remainder field, writes these two fields to a hash file.

Run this job 15 times feeding it the different file names each time. You may be able to run this as a multiple instance with 15 instances running.

Because you are defining each sequential file as just two fields, key and non key, you do not need to worry about individual column metadata.

If you then read the hash file and write out the key field and all fifteen columns to a delimited sequential file you will have a fully merged sequential file. The delimiters within the hash file fields are merged with your sequential file delimiters to form a long delimited record.
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

Post by battaliou »

I'm still keen on the SQL relational approach. One of the problems I have using hash files is that the source data seems to have a one to many relationship e.g. column matnr may not be unique in file mvke.txt. An outer join will fix that in Oracle by returning all of the relationships.

Maybe you could post some sample data?
ds_user78
Participant
Posts: 23
Joined: Thu Nov 11, 2004 5:39 pm

Post by ds_user78 »

precisely. I would have multiple rows in mvke.txt for a single entry in mara.txt. Some sample entries in mara are
  • 00450L00K01DT MFERTEA 01 0.000 LB 0.000 0.000
    0003YA16D16BOB2 MFERTEA 01 0.000 LB 0.000 0.000
and some entries in mvke.txt are
  • 00450L00K01DT 200000PW102S////N300// ERLA101001
    00450L00K01DT 100000PW102S////N300// ERLA101001
.

The first material has 2 records in mvke and the second one has none.
thanks for the suggestions.
Post Reply