Reading file pattern

keshav0307 · Post by **keshav0307** » Tue Jun 26, 2007 5:57 am

if the sequential file stage has defined to read a file pattern, then will the job read multiple file in parallel or one file then next.

ray.wurlod · Post by **ray.wurlod** » Tue Jun 26, 2007 6:39 am

Yes.

Generate the score, inspect the score, and find out. Or just look at the link marker on the Sequential File stage - that alone will answer your question.

keshav0307 · Post by **keshav0307** » Tue Jun 26, 2007 9:24 am

i check properties of the sequential file stage.
the stage become parallel by default, but the node constraint is selected too.
so will this stage read from multiple compute node or all the nodes ( may be one file from one node)

mctny · Post by **mctny** » Tue Jun 26, 2007 1:13 pm

I think (not 100% sure), it will read the files one by one, not in parallel.

DSguru2B · Post by **DSguru2B** » Tue Jun 26, 2007 1:26 pm

No. For multiple files, multiple nodes(if available) are used. All this is documented in Parallel Developer's Guide.

bcarlson · Post by **bcarlson** » Tue Jun 26, 2007 3:51 pm

The documentation in the Parallel Developer's Guide is okay, but it gives a partial picture. If you search on 'file pattern' you will find plenty of info about what it is, but not necessarily how it works.

Now if you look at the Parallel Job Advanced User Guide you will find a reference to a very handy option called APT_IMPORT_PATTERN_USES_FILESET. Turns out the default action when using a file pattern is to read the files sequentially. That may not be a problem with small files, but start reading in several files with millions of records each and you'll watch your hair growing faster than your job is running. Okay, slight exaggeration...

The APT_IMPORT_PATTERN_USES_FILESET parameter tells DataStage to create a dynamic fileset based on the file pattern you give it. Filesets are great because DataStage reads all the files in parallel. The problem is that there is hardcoding in your fileset that you may want to avoid. This option will create one dynamically so you get the benefit of a parallel read and can skip the hardcoding.

I wish you didn't have to search through multiple manuals to get the whole picture. Don't get me wrong, you should always read the manuals. We just got bit by this particular issue and not finding the info we needed right away, so I figured I would throw out my 2 cents before you pull your hair out like we did.

Hope this helps.

Brad.

DSguru2B · Post by **DSguru2B** » Wed Jun 27, 2007 7:03 am

bcarlson wrote:... and you'll watch your hair growing faster than your job is running. Okay, slight exaggeration...