Reading file pattern
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
Reading file pattern
if the sequential file stage has defined to read a file pattern, then will the job read multiple file in parallel or one file then next.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Yes.
Generate the score, inspect the score, and find out. Or just look at the link marker on the Sequential File stage - that alone will answer your question.
Generate the score, inspect the score, and find out. Or just look at the link marker on the Sequential File stage - that alone will answer your question.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
The documentation in the Parallel Developer's Guide is okay, but it gives a partial picture. If you search on 'file pattern' you will find plenty of info about what it is, but not necessarily how it works.
Now if you look at the Parallel Job Advanced User Guide you will find a reference to a very handy option called APT_IMPORT_PATTERN_USES_FILESET. Turns out the default action when using a file pattern is to read the files sequentially. That may not be a problem with small files, but start reading in several files with millions of records each and you'll watch your hair growing faster than your job is running. Okay, slight exaggeration...
The APT_IMPORT_PATTERN_USES_FILESET parameter tells DataStage to create a dynamic fileset based on the file pattern you give it. Filesets are great because DataStage reads all the files in parallel. The problem is that there is hardcoding in your fileset that you may want to avoid. This option will create one dynamically so you get the benefit of a parallel read and can skip the hardcoding.
I wish you didn't have to search through multiple manuals to get the whole picture. Don't get me wrong, you should always read the manuals. We just got bit by this particular issue and not finding the info we needed right away, so I figured I would throw out my 2 cents before you pull your hair out like we did.
Hope this helps.
Brad.
Now if you look at the Parallel Job Advanced User Guide you will find a reference to a very handy option called APT_IMPORT_PATTERN_USES_FILESET. Turns out the default action when using a file pattern is to read the files sequentially. That may not be a problem with small files, but start reading in several files with millions of records each and you'll watch your hair growing faster than your job is running. Okay, slight exaggeration...
The APT_IMPORT_PATTERN_USES_FILESET parameter tells DataStage to create a dynamic fileset based on the file pattern you give it. Filesets are great because DataStage reads all the files in parallel. The problem is that there is hardcoding in your fileset that you may want to avoid. This option will create one dynamically so you get the benefit of a parallel read and can skip the hardcoding.
I wish you didn't have to search through multiple manuals to get the whole picture. Don't get me wrong, you should always read the manuals. We just got bit by this particular issue and not finding the info we needed right away, so I figured I would throw out my 2 cents before you pull your hair out like we did.
Hope this helps.
Brad.