Page 1 of 1

Adding a Flush record to the end of a file using DataStage

Posted: Fri Jun 19, 2009 4:39 am
by abhilashnair
I need to convert a Unix shell script into a DS PX job. The shell script is taking a fixed width file as input sorting it and then adding an extra flush record to the end of the sorted output file. The extra record is nothing but spaces in all fields. i.e suppose the width of the i/p file is 100 bytes and it has 100 rows, the Unix shell script will create an output file which is sorted and wil contain 101 rows, the last row being 100 spaces. How can this be done is a DS job?

Posted: Fri Jun 19, 2009 5:01 am
by ArndW
The simplest means of doing this is calling an after-job which shells out to UNIX and just issue the command to append those spaces to the file.

Posted: Fri Jun 19, 2009 5:42 am
by abhilashnair
This option was initially thought upon but then rejected. Any way to do it in the job itself?

Posted: Fri Jun 19, 2009 5:46 am
by balajisr
abhilashnair wrote:This option was initially thought upon but then rejected. Any way to do it in the job itself?
Why was it rejected? This may help us in giving a solution.

Posted: Fri Jun 19, 2009 6:17 am
by Sreenivasulu
The suggestion by ArndW is a simple solution fo this problem. Datastage solution could be a complex one and most probably you can do it only in a server job (i.e not in a parallel job)

Regards
Sreeni
ArndW wrote:The simplest means of doing this is calling an after-job which shells out to UNIX and just issue the command to append those spaces to the file. ...

Posted: Fri Jun 19, 2009 7:31 am
by ArndW
Another solution is to could add a merge stage to append a line to your output, just ensure that the order is set correctly. This line would be created using a row generator stage.

Adding a Flush record to end of a file for vertical pivot

Posted: Fri Jul 24, 2009 11:51 am
by rcanaran
ArndW wrote:Another solution is to could add a merge stage to append a line to your output, just ensure that the order is set correctly. This line would be created using a row generator stage.
I was looking for something similar to finish off a vertical pivot being coded in a transformer (7.5.1 Parallel).

I used a rowgenerator stage to generate the single row. Then a transformer to overwrite the key value with high values (not cobol/mainframe high values (hex 'FF's), but the upper end of the valid range for the datatype). The stream gets sorted later by key and I needed to ensure the generated row is the last record after the sort.

The last transformer performed the vertical pivot logic writing out only the previously accumulated record on a keychange. I needed this "last record" to flush out the real lsat accumualted data record.

Posted: Fri Jul 24, 2009 7:32 pm
by ray.wurlod
There may be an "end of data" token available in a future release (8.5?).

Posted: Sun Jul 26, 2009 8:40 am
by datisaq
You can row generator to generate the last row(having same meta data) and club with the original dataset using funnel stage.In funnel stage there is an option available where you can append the first dataset to the output and then the second(row generator).

DS experts, please correct me if i'm wrong..

Posted: Sun Jul 26, 2009 8:26 pm
by rcanaran
I did indeed have a funnel after the row generator and transformer. The funnel can physically insert the row as the last one. But a subsequent stage needs the input sorted by a key field and a randomly generated value by the row generator would have then ended up in an unpredictable position. I needed to guarantee the row ended up being last before going to that stage. I'm sure I could have funneled in the generated row at a different point in the dsjob to avoid having the sort, but for other reasons it was better to have the funnel at that particular point.

Also, I think else where in this thread someone mentioned that they couldn't use and after job to concatenate the last row via a call to the OS (unix shell script). At my current site, I also cannot do this. Site strandard. I couldn't even do this when it was more efficient to use and external filter (SED in this case) to cleanse some extra data. Cleansing in a parallel derivation or stage variable would have resulted in SLOW execution. And none of the sites I have been were using parallel routines while I was there.

Supportability often wins instead of efficiency, simplicity or elegance.