Reading file pattern

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Reading file pattern

Post by keshav0307 »

if the sequential file stage has defined to read a file pattern, then will the job read multiple file in parallel or one file then next.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes.

Generate the score, inspect the score, and find out. Or just look at the link marker on the Sequential File stage - that alone will answer your question.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

i check properties of the sequential file stage.
the stage become parallel by default, but the node constraint is selected too.
so will this stage read from multiple compute node or all the nodes ( may be one file from one node)
mctny
Charter Member
Charter Member
Posts: 166
Joined: Thu Feb 02, 2006 6:55 am

Post by mctny »

I think (not 100% sure), it will read the files one by one, not in parallel.
Thanks,
Chad
__________________________________________________________________
"There are three kinds of people in this world; Ones who know how to count and the others who don't know how to count !"
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

No. For multiple files, multiple nodes(if available) are used. All this is documented in Parallel Developer's Guide.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

The documentation in the Parallel Developer's Guide is okay, but it gives a partial picture. If you search on 'file pattern' you will find plenty of info about what it is, but not necessarily how it works.

Now if you look at the Parallel Job Advanced User Guide you will find a reference to a very handy option called APT_IMPORT_PATTERN_USES_FILESET. Turns out the default action when using a file pattern is to read the files sequentially. That may not be a problem with small files, but start reading in several files with millions of records each and you'll watch your hair growing faster than your job is running. Okay, slight exaggeration...

The APT_IMPORT_PATTERN_USES_FILESET parameter tells DataStage to create a dynamic fileset based on the file pattern you give it. Filesets are great because DataStage reads all the files in parallel. The problem is that there is hardcoding in your fileset that you may want to avoid. This option will create one dynamically so you get the benefit of a parallel read and can skip the hardcoding.

I wish you didn't have to search through multiple manuals to get the whole picture. Don't get me wrong, you should always read the manuals. We just got bit by this particular issue and not finding the info we needed right away, so I figured I would throw out my 2 cents before you pull your hair out like we did.

Hope this helps.

Brad.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

bcarlson wrote:... and you'll watch your hair growing faster than your job is running. Okay, slight exaggeration...
:lol:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply