multiple sequential file creation?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
jonathanhale
Premium Member
Premium Member
Posts: 33
Joined: Tue Nov 06, 2007 1:09 pm

multiple sequential file creation?

Post by jonathanhale »

I have table blobs, consisting of blob_id (char16) and blob_cont (char 4MB)

I need to create sequential files for each row in blobs, where filename = blob_id and file content = blob_cont

Is there a way of creating multiple differently named sequential files simultaneously in DataStage?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... not really, unless you know how many max you'll have ahead of time and build that many output stages into your job. Typical answer would be to create a single file and then script something after-job to split the file into multiple files with the names you require.

Another answer might be to write the output to a type 19 (?) hashed file, which is basically a directory and every 'record' becomes a file inside that directory. Pretty sure it's a type 19 but again you may need to rename the files post-job if you have a particular naming scheme in mind as I don't believe you can control the filenames.

Yet another answer may be (oddly enough) an XML Output stage with a 'trigger' column, just letting your data pass thru the stage with no 'xmling' going on. It creates new files whenever the value in the trigger column changes and the trigger column doesn't need to be output.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... not sure I would have ponied up some of those thoughts if I'd known we were talking about two million files daily. Except perhaps academically. :wink:

I think your "Option 3" is a perfectly valid solution, kudos for coming up with that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
throbinson
Charter Member
Charter Member
Posts: 299
Joined: Wed Nov 13, 2002 5:38 pm
Location: USA

Post by throbinson »

Option 4
A single job that contains an output link to a Folder stage. This will write a unique file per row of incoming data almost exactly like Option 3. The first column in the derivation would be the file path/name, the second, third, etc. is the data.
jonathanhale
Premium Member
Premium Member
Posts: 33
Joined: Tue Nov 06, 2007 1:09 pm

Post by jonathanhale »

Option 4 does also work - slightly slower than the routine version - and in my testing I was not successful in passing the path. i.e. the path must be set by a job parameter, and col 1 to the folder stage becomes filename, col 2, etc file content.

Despite the theoretical likelihood that Parallel jobs will not be particularly helpful for this requirement, we are still interested in comparing the performance of parallel against server.

However, no folder stage available for parallel jobs. Is there an equivalent/alternative? Can file sets be utilised like this for output?

A Parallel routine can not be basic. Anybody ever come across a basic to C++ converter? :D

Otherwise, I guess I need to write a C++ routine that sits on the server file system to be called from the parallel job? Is that the right theory?

Any other comments or remarks?
throbinson
Charter Member
Charter Member
Posts: 299
Joined: Wed Nov 13, 2002 5:38 pm
Location: USA

Post by throbinson »

A parameter for the path will work in the Folder Path name of the Folder Stage Properties tab. This didn't work for you? I do not know a way to replicate the folder stage write in EE although I am sure that cat can be skinned.
Post Reply