Controlling the data flow between the stages in same job.

hemachandra.m · Post by **hemachandra.m** » Thu Jul 22, 2010 3:34 am

Please see the below job design

Source(Seqfile)
|
|
|
CopyStage--->Trans1---->Target Seq File1(XYZ.txt)
|
|
|
Trans2
|
|
|
Target Seq File2(XYZ.txt)

In the above scenario I want to control a data flow like, first need to trigger the Trans1 stage and then trigger the Trans2 stage in the same job. Both Target seq file stages have same file name XYZ.txt. The properties of Target Seq File1 stage have overwrite mode and Target Seq File2 stage have append mode.

It is possible if we create a two job flows, but unfortunately our clients are not willing to have one more job. If we go this process they need to create 350 more jobs instead 350 jobs.

Can anybody help me on this.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Jul 22, 2010 4:00 am

You need to clarify what is happening and what is passed in the jobs.

Using my assumptions of your requirement...

First-and-foremost, why split into two streams when you can write everything in one stream itself.

Second, if the metadata is different, you can always create two separate files and merge them later.

Third, if the idea is to overwrite the file each time, then you can have a pre-job action of creating a zero bytes file and appending in the job.

chulett · Post by **chulett** » Thu Jul 22, 2010 5:40 am

You cannot have multiple writer processes targeting the same sequential file simulataneuously. That's just "how it works" and has nothing to do with DataStage - it is a restriction of the nature of "sequential" media. If that is what you are doing (and it seems that way) you will need to rethink/redesign this.

A typical solution for building a header/detail/trailer file would be to build all three separately and then concatenate them together after job. However, if your detail file size precludes that you'll need to be a little more clever, especially if for some odd reason the creation of more jobs (???) to do this properly is problematical. So... could you create the header record in a "before job" routine or script? If so, then your detail records could appended to it like you are now and you could write the one record trailer to another (separate) file and lastly cat it on the end (and even delete it, if you like) after job.

hemachandra.m · Post by **hemachandra.m** » Thu Jul 22, 2010 7:46 am

@Chulett
I am clear up to loading the detail records into a file XYZ.txt (Header record will have before job script and it will be overwrite mode) with append mode in a same job.

Do you mean to state like, need to create one more job to calculating the trailer record and then cat the trailer record file and append it to detail record file with a last record.

If yes, now we can have two jobs :

1. Header (before job script) and detail record formatted job :
This will create XYZ.txt file with header record (from script with over write mode ) and from Trans stage could append the detail records in same file XYZ.txt.
2. Trailer record formatted job:
This job will get count of the detail records in XYZ.txt and loaded in to another file ABC.txt. Can have after job routine/script to cat the ABC.txt file and append this trailer record into XYZ.txt with last/end record.

Here I am bit worrying about my requirement, as I posted earlier the count of the records should not be a trailer record (some other defaulted record for trailer) it should be a part of the header record.

How to insert a record in first line of a file (XYZ.txt).

I tried with :
1. sed -1i (unix) unfortunately this option is not available in my unix box.
2. by using ex (unix) command is working fine with small amount of data, if the file is huge it is not working.

Is there any other way to insert a record into first line of an existing file?

Thanks in advance.

chulett · Post by **chulett** » Thu Jul 22, 2010 8:04 am

No, I would think this could all be done in one job... except for the 'counts in the header' part, that complicates things. Unfortunately, off the top of my head I'm not aware of any trick to insert a record at the front of an existing file without concatenation and the creation of a new file. Not saying it can't be done, just that I'm drawing a blank right now. Perhaps others will be more smarter.

FranklinE · Post by **FranklinE** » Thu Jul 22, 2010 8:25 am

I've used the after-job routine to accomplish something like this: create separate (differently named) intermediate sequential files, than at the end concatenate them (in the order you require) into the final file with an after-job routine.

Another possiblity, one I haven't thought through in any detail, is to branch the job into two (or more) simultaneous tracks and merge those tracks at the very end, without writing to a sequential file on each track.

Code: Select all

InFile ---> Copy ----> trans1 ----> Merge ----> OutFile
              |                        |
              ------> trans2 ---------->

ray.wurlod · Post by **ray.wurlod** » Thu Jul 22, 2010 6:45 pm

Please wrap your "ASCII art" in Code tags. It makes it easy to read.

chulett · Post by **chulett** » Thu Jul 22, 2010 8:54 pm

There you go.

hemachandra.m · Post by **hemachandra.m** » Fri Jul 23, 2010 12:47 am

@ Franklin

Could you please elaborate or post your subroutine, how to insert a record at the front of an existing file

My brief requirement:
There is a file called XYZ.txt with size of 40 to 60 GB of data and this file having detail and trailer part. I need to insert a header record into the existing file (XYZ.txt) with taking the count of XYZ.txt (only detail record count).

Note : Since this file (XYZ.txt) size is varying 40 to 60 GB might go 80 GB also and nearly 40 to 50 different business files will be created at the disk, my client is not will to have any intermediate files for processing. They requested us to do this in only one part of the code.

Please read all prior posts for this requirement.

Thanks for your participation.

Welcome any other suggestions & Techniques

satyanarayana · Post by **satyanarayana** » Fri Jul 23, 2010 3:45 am

1) Creat 'HEADER' record using 'CAT' command in job sequencer or before job routine.

2)append 'DETAIL' records in to XYZ.txt file

3)Create and append 'TRAILER' record in job sequencer or in same job it self.
if you want create TRAILER in same job then get row num from source file as column,if you have any $ amount in file then sum those amounts in TRNS(1) and transfer rownum and $ amount into COUNT_$AMOUNT.txt
and append this file to XYZ.txt file in Job sequncer or after job routine.

source--------->trns(1)---------------XYZ.txt
|
|
|
COUNT_$AMOUNT.txt

hemachandra.m · Post by **hemachandra.m** » Fri Jul 23, 2010 4:45 am

@satyanarayana..

For creating the Header Record, first i need to load the detal data into XYZ.txt file and then need to take the count of the detail records from XYZ.txt and again this count record need to insert first record (Header) into a XYZ.txt file.

Could plese read the top post's of my requirement.

ray.wurlod · Post by **ray.wurlod** » Fri Jul 23, 2010 5:26 am

Can you change the job design entirely? Process all records, but write three files (the header, the details and the trailer). The header and trailer could well be in /tmp. In an after-job subroutine assemble the three into one file and delete the temporary files.

Code: Select all

cat header_file detail_file trailer_file > detail_file && rm header_file trailer_file

chulett · Post by **chulett** » Fri Jul 23, 2010 6:05 am

That was already considered but there is a concern about the size of the detail file so the OP was looking into... alternatives, techniques that don't require duplicating the data, even temporarily.

FranklinE · Post by **FranklinE** » Fri Jul 23, 2010 8:18 am

Re html code: There's only so much data that my sieve-like memory can sift. Old dog new trick, and I appreciate getting help.

My teen-aged daughter knows html better than I do.

Hemachandra, I did miss your modified requirement description. Sorry about that.

You should be able to handle this in one process without needing to write/read files. The trick is in finding ways to branch the job and re-join or merge back again.

If you need to update the record count in the header, you could try doing a copy, with one branch going to an Aggregate stage that does your record count -- your aggregation key could be the row identifier for detail rows -- and creates a single output record in the same format as your header. You can use Join (actually, since there will only ever be one row to update, Lookup might be better) to update the "old" header on the other side. Your join key could be as simple as the column that identifies whether a row is header or detail, setting it to the header value on output from Aggregate.

I don't have time to spend experimenting with this, so I don't know if there's one or more flaws in this approach, but it seems simple. It also doesn't avoid temporary duplication of the data, as Craig suggests.

1) Create header as first row.
2) Merge/add detail rows.
3) Split (copy) into two links, one going to the Aggregate with one row output for the row count and the record identifier set to header.
4) Join back to other link from the split, using the link from Aggregate as your update.

ray.wurlod · Post by **ray.wurlod** » Fri Jul 23, 2010 3:56 pm

chulett wrote:That was already considered but there is a concern about the size of the detail file so the OP was looking into... alternatives, techniques that don't require duplicating the data, even temporarily.

Look carefully at my redirection. The detail_file is also the target file. Only the header and trailer (one line each) are "duplicated". I think they can tolerate that.

DSXchange

Controlling the data flow between the stages in same job.

Controlling the data flow between the stages in same job.

Something similar