Sequencial file name capture in a job
Moderators: chulett, rschirm, roy
Sequencial file name capture in a job
Hi,
I am using sequential file stage to read data from *XYZ*.txt files. when I select file name column in the output file name is coming as *XYZ*.txt ..
My question is how to get the actual file name while reading the data from sequential file..
Thanks in advance....
trm
I am using sequential file stage to read data from *XYZ*.txt files. when I select file name column in the output file name is coming as *XYZ*.txt ..
My question is how to get the actual file name while reading the data from sequential file..
Thanks in advance....
trm
Hi,
In your output link change the Read Method to File Pattern,
this will let you use wildcards for the file name
IHTH,
In your output link change the Read Method to File Pattern,
this will let you use wildcards for the file name
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
As far as I am aware this is not possible, but would be happy to be proven wrong. Each row in the stream of rows that is being processed may have come from any of the files. Reading from the files that match a pattern is like using cat to make a single stream in a filter, except that you can get some parallelism happening.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi,
There is an Environment Variable - APT_IMPORT_PATTERN_USES_FILESET, which when set to TRUE, returns the exact file name from which the record is being read.
There was a post regarding this in the Developer net forum, which was answered by Danny Owen.
I have used this in one scenario, and it does work fine with the File Pattern option. But there was 1 issue - if there are no files matching the pattern mentioned, then the job aborts.
Hope this helps.
Regards,
The Bird.
There is an Environment Variable - APT_IMPORT_PATTERN_USES_FILESET, which when set to TRUE, returns the exact file name from which the record is being read.
There was a post regarding this in the Developer net forum, which was answered by Danny Owen.
I have used this in one scenario, and it does work fine with the File Pattern option. But there was 1 issue - if there are no files matching the pattern mentioned, then the job aborts.
Hope this helps.
Regards,
The Bird.
Hi trm,
There is no other parameters/variables that you have to set for this. If this variable is set and -
1. File pattern option set in your source sequential file stage to read the multiple source files
2. The File name column option chosen in the Source sequential file stage and the additional column (for the Source File Name) defined in the Columns tab
you should be able to see the corresponding source file name from which the record is read, when you do a View Data on the source stage. And this column, you should be able to carry forward to the downstream stages.
Hope this solves your problem.
Regards,
The Bird.
There is no other parameters/variables that you have to set for this. If this variable is set and -
1. File pattern option set in your source sequential file stage to read the multiple source files
2. The File name column option chosen in the Source sequential file stage and the additional column (for the Source File Name) defined in the Columns tab
you should be able to see the corresponding source file name from which the record is read, when you do a View Data on the source stage. And this column, you should be able to carry forward to the downstream stages.
Hope this solves your problem.
Regards,
The Bird.
this is precisely my experience as well - APT_IMPORT_PATTERN_USES_FILESET causes each node from apt config to pick up a file name and use that for all the files it processes.
so in my case i have two nodes and 200 files, as a result (if i have APT_IMPORT_PATTERN_USES_FILESET set to true for the job) i get a file name column populated by the sequential stage, but there are only two unique values in it instead of 200).
file1.dat,data1,data2
file2.dat,data3,date4
file1.dat,data5,data6 <-- this actually came from file3.dat
file2.dat,data7,date8 <-- this actually came from file4.dat
...
alternatively, if in a naive assumption that things would work in a "common sense" way (without setting any variables), i would specify the file name column in sequential stage, and specify a file pattern in a read method, and feed it the wildcard corresponding to my files, every single row would have my wildcard, not the actual expanded file name.
*.dat,data1,data2 <-- this actually came from file1.dat
*.dat,data3,date4 <-- this actually came from file2.dat
*.dat,data5,data6 <-- this actually came from file3.dat
*.dat,data7,date8 <-- this actually came from file4.dat
therefore file name column option in sequential file stage is pretty much useless and misleading, as well as APT_IMPORT_PATTERN_USES_FILESET variable.
so, the question remains - is there a simple (config-time) option to preserve the file name from the pattern-based files read by the sequential file name stage?
thank you.
so in my case i have two nodes and 200 files, as a result (if i have APT_IMPORT_PATTERN_USES_FILESET set to true for the job) i get a file name column populated by the sequential stage, but there are only two unique values in it instead of 200).
file1.dat,data1,data2
file2.dat,data3,date4
file1.dat,data5,data6 <-- this actually came from file3.dat
file2.dat,data7,date8 <-- this actually came from file4.dat
...
alternatively, if in a naive assumption that things would work in a "common sense" way (without setting any variables), i would specify the file name column in sequential stage, and specify a file pattern in a read method, and feed it the wildcard corresponding to my files, every single row would have my wildcard, not the actual expanded file name.
*.dat,data1,data2 <-- this actually came from file1.dat
*.dat,data3,date4 <-- this actually came from file2.dat
*.dat,data5,data6 <-- this actually came from file3.dat
*.dat,data7,date8 <-- this actually came from file4.dat
therefore file name column option in sequential file stage is pretty much useless and misleading, as well as APT_IMPORT_PATTERN_USES_FILESET variable.
so, the question remains - is there a simple (config-time) option to preserve the file name from the pattern-based files read by the sequential file name stage?
thank you.
trammohan wrote:Hi Brid,
I have 2 input files ( trm1.txt and trm2.txt ). It is picking up the trm2.txt file name and putting in the file_name column even for trm1.txt records...
trm
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The Sequential File stage can generate two additional columns, one containing the file name of the file currently being read, the other containing the line number within that file of the record currently being read.
But, as noted, it may be wise to set APT_IMPORT_PATTERN_USES_FILESET to False. Or at the very least to experiment. That reported behaviour suggests a small bug.
But, as noted, it may be wise to set APT_IMPORT_PATTERN_USES_FILESET to False. Or at the very least to experiment. That reported behaviour suggests a small bug.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
thank you for your response, but i am afraid you did not read my post correctly or the post i was replying to.ray.wurlod wrote:The Sequential File stage can generate two additional columns, one containing the file name of the file currently being read, the other containing the line number within that file of the record currently being read.
But, as noted, it may be wise to set APT_IMPORT_PATTERN_USES_FILESET to False. Or at the very least to experiment. That reported behaviour suggests a small bug.
let me try again.
given in sequential file stage:
- "file name column" is set under "options"
- file pattern is set to /dir/file*
- read method is set to "file pattern"
APT_IMPORT_PATTERN_USES_FILESET is not present or explicitly set to false:
- i get /dir/file* as the value of the file name column for all records in every file
APT_IMPORT_PATTERN_USES_FILESET is set to true
- i get just one unique file name as the value of the file name column for all records in every file. if i run under 2-node configuration, i get two unique file names, etc. so if i have 100 different files, only two file names will ever be used.
once again, both "file name column" and APT_IMPORT_PATTERN_USES_FILESET do not work in this situation.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The "file name column" property must refer to a column (type VarChar probably) that is defined on the output link. Is this the case with your design?
The same is true for the file row number property, if you use that.
The same is true for the file row number property, if you use that.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
yes, and, as i mentioned, it gets populated - just with the wrong data.ray.wurlod wrote:The "file name column" property must refer to a column (type VarChar probably) that is defined on the output link. Is this the case with your design?
The same is true for the file row number property, if you use that.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If that's the case (I have not have a chance to check yet) you need to report the bug through your support provider. They will also demand a reproducible case, so have that ready so they can't stall you.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.