Logically concatenate sequential files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Logically concatenate sequential files

Post by sbass1 »

Hi,

I searched on "concatenate", found this link:

viewtopic.php?t=125140&highlight=concatenate

This hit specifically said to physically concatenate the files external to DS before processing.

However, is it possible to logically concatenate sequential files within DS? The link collector looked promising, but I neither want round robin nor sort/merge. It would be nice if there was a third option, link order, plus an option to set the link order.

So, if my input files are:

File1:
B
C
D

File2:
A
B
C
D
E

I want:
B
C
D
A
B
C
D
E

in my output file, with all processing "in memory" within DS.

Thanks...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not with server jobs. Typically you'd use cat as a Filter command in the Sequential File stage that reads them.

(The Funnel stage in parallel jobs does have an input-at-a-time method.)

Right now I'm constructing header/details/trailer files using a job sequence; three activities (among others) - one creates the file and writes the header to it (that's usually a job activity), one appends the detail lines (probably an Execute Command activity since the detail lines were created in an earlier job) and the third appends the trailer (this may be a Job activity that calculates and appends the trailer, or an Execute Command activity that appends an already-calculated trailer).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

Thanks Ray, much appreciated...

My "actual" requirement is creating a dimension table with:

CountryCodeSK
CountryCodeNK
CountryDesc

CountryCodeNK and CountryDesc come from a tilde delimited seq file. If any source file CountryCodeNK does not map to a valid CountryCode, I will map it to the "invalid data" SK. So, in that respect, every data element will map.

There are several ways to approach this; the one I've decided is:

SeqFile_InvalidCountryCode
Stage uses filter command, /dev/null file, echo "XX~INVALID COUNTRY CODE" as the filter.

SeqFile_Country_Codes
File containing data described above

Link Collector
Round Robin

This only works because SeqFile_InvalidCountryCode contains a single row. Well, "works" is a relative term; I want the invalid country as the first row in the dimension table. But, I can definitely see a future requirement for concatenating physical files; it would be nice if this could be done within DS rather than from the O/S.

I've designed it this way so the source file can be overwritten at will w/o worrying about clobbering the invalid country row. I think it also makes the processing clear in the DS job itself, rather than referring to some back end process on the server machine.

I suppose I could write a script like:

echo "XX~INVALID COUNTRY" | cat - path_to_country_codes.dat

and use that as input to the job, but I think it's visually less clear than my approach.

P.S.: Another ETL tool I've used allows the creation of sequential file views, where the physical file(s) can change underneath the view, and the data dynamically refreshes at run time. Normal SQL type processing can also be applied to the view. This would be a useful addition to DS.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's no such thing as "the first row in a table".

As noted, there are solutions available in parallel jobs, and your site does have Enterprise Edition. Perhaps you could investigate.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

ray.wurlod wrote:There's no such thing as "the first row in a table".
Yeah, good point.
ray.wurlod wrote:As noted, there are solutions available in parallel jobs, and your site does have Enterprise Edition. Perhaps you could investigate.
If "my site" has Enterprise Edition, it's not in my department, nor accessible to me, unfortunately.

My test job was writing to a sequential file for testing.

My real job feeds the link collector to a transform, where I get:

"Link Collector stage does not support in-process active-to-active inputs or outputs."

Back to the drawing board...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use IPC stages to separate the Link Collector from other active stages.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply