Reading files using file pattern -Is there a max limit?
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 59
- Joined: Sat Jul 05, 2008 11:32 am
Reading files using file pattern -Is there a max limit?
I have job with a sequential file stage that reads files using a file pattern and loads in to a table.
The job design:
seq stage --> transformer --> Target Table
This job is running file with few files (I tested with 8 files) qualified for the file pattern.
But, when I have 500 files, the job log says : 'Couldn't find any files on host dssrv001 with pattern /inputfiledir/Datafile*'.
But when I moved 492 files to a different directory, I am able to process remaining 8 files successfully.
After I move back the 492 files to the directory I am trying to process from (total of 500 files) then getting same message that couldn't find files.
Just wondering is there a limit to process number of files using file pattern in sequential file state? If the limit exists it should say some appropriate message?
When I see in unix I see all the files exist.
Any help with this is highly appreciated.
Thank you!
The job design:
seq stage --> transformer --> Target Table
This job is running file with few files (I tested with 8 files) qualified for the file pattern.
But, when I have 500 files, the job log says : 'Couldn't find any files on host dssrv001 with pattern /inputfiledir/Datafile*'.
But when I moved 492 files to a different directory, I am able to process remaining 8 files successfully.
After I move back the 492 files to the directory I am trying to process from (total of 500 files) then getting same message that couldn't find files.
Just wondering is there a limit to process number of files using file pattern in sequential file state? If the limit exists it should say some appropriate message?
When I see in unix I see all the files exist.
Any help with this is highly appreciated.
Thank you!
-
- Participant
- Posts: 59
- Joined: Sat Jul 05, 2008 11:32 am
Re: Reading files using file pattern -Is there a max limit?
One obeservation though:
in the file pattern: /inputfiledir/Datafile*
8 files are:
Datafile_1_1 format
remaining 492 files are as Datafile_1234_1234 format
If I keep these 492 files or 8 file only in the /inputfiledir, then I am able to read. But when I have all the 500 files, getting 'file not found' message.
8 files Datafile_1_1*
492 files Datafile_1234_1234*
But when I do a pattern on Datafile* in datastage I expect to process all 500 files. The file layout is same for all the 500 files.
Thanks.
in the file pattern: /inputfiledir/Datafile*
8 files are:
Datafile_1_1 format
remaining 492 files are as Datafile_1234_1234 format
If I keep these 492 files or 8 file only in the /inputfiledir, then I am able to read. But when I have all the 500 files, getting 'file not found' message.
8 files Datafile_1_1*
492 files Datafile_1234_1234*
But when I do a pattern on Datafile* in datastage I expect to process all 500 files. The file layout is same for all the 500 files.
Thanks.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Try this UNIX command and see it you run into a problem.
There may be an operating system limit on the size of a list.
Code: Select all
for file in `ls -1 Datafile*`
do
echo $file
done
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 59
- Joined: Sat Jul 05, 2008 11:32 am
I am able to see all files in OS (unix) using ls -l Datafile*. But when do that in Datastage using file pattern in sequential file stage, it does not find a file. The operating system does not seem to have a limitation as I am seeing the files with OS commands. Is there a limitation in datastage to read number of files using file pattern? If so, at least should it say some valid message like 'file limit exceeded or so'. Why it says no file found.ray.wurlod wrote:Try this UNIX command and see it you run into a problem.There may be an operating system limit on the size of a list. ...Code: Select all
for file in `ls -1 Datafile*` do echo $file done
Any help with this highly appreciated.
Thanks.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Very few operating systems have no limit, most have an "arg max" config variable that sets where you get the dreaded "arg list too long" error from the globbing that goes on. There are workarounds if you were doing this manually, however since it's hard to say if that's the issue or not since you get "no files found" rather than any kind of an error, but then the stage could be masking that.
If this doesn't work out for you, I'd suggest pinging your official support vendor. BTW, what flavor of UNIX are you running? That's always good to mention as there are many and each have their own unique... characteristics.
If this doesn't work out for you, I'd suggest pinging your official support vendor. BTW, what flavor of UNIX are you running? That's always good to mention as there are many and each have their own unique... characteristics.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
You can concatenate all your files /inputfiledir/Datafile* to a single file in your before job subroutine. Use concatenated file in your sequential stage.
Code: Select all
cat /inputfiledir/Datafile* > /inputfiledir/NewFile
You are the creator of your destiny - Swami Vivekananda
-
- Participant
- Posts: 59
- Joined: Sat Jul 05, 2008 11:32 am
Anu,anbu wrote:You can concatenate all your files /inputfiledir/Datafile* to a single file in your before job subroutine. Use concatenated file in your sequential stage.
Code: Select all
cat /inputfiledir/Datafile* > /inputfiledir/NewFile
Does it cause processing time to merge files into one?
Craig,
The unix OS is: AIX 5.3, I get 'arg list too long' in OS commands only when I have too many files (may be about 1500 or so.].
But, this is in read file from datastage 'sequential file stage'. Is there a separate limit for datastage to read file pattern fron Sequential file stage'. I am able to see the files correctly in OS commands. But, the sequential file stage is not working number of file matching is above certain #x. And these 'x' number of files still I am able to see directly in OS with commands like ls <file pattern>,
Thank you all for your valuable inputs, it helped me alot!
Well, "too many" is a bit of a red herring as it is all about how much space their names take up rather than strictly the number of files.
We're not privy to the inner workings of the stage, hence the suggestion to take this to support.
We're not privy to the inner workings of the stage, hence the suggestion to take this to support.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 59
- Joined: Sat Jul 05, 2008 11:32 am
Just interested to know if there is a limitation of total file(s) name(s) length for the 'import operator' of squential file stage when we read files using file pattern witha wild card? We are able to figure out the unix system limitation when using commands ls/mv etc.. But trying to understang how the import operator works to read data from sequential file. does it use 'cat' command internally?chulett wrote:Well, "too many" is a bit of a red herring as it is all about how much space their names take up rather than strictly the number of files.
We're not privy to the inner workings of the stage, hence the suggestion to take this to support.
Thank you all!
Nobody but engineering would have a definitive response on how the stage works, but you can get around the AIX "too many file" limitations by using a series of wildcard commands, each of which would return a smaller number of files, but as a group would cover all possible files.
You might also want to play around with using a shellscript to cat the files to standard out, and using an "external source" stage since you are on release 8.
You might also want to play around with using a shellscript to cat the files to standard out, and using an "external source" stage since you are on release 8.
-
- Premium Member
- Posts: 238
- Joined: Fri Jul 25, 2008 8:55 am