Page 1 of 1

Reading files using file pattern -Is there a max limit?

Posted: Fri Feb 26, 2010 5:03 pm
by Chandrathdsx
I have job with a sequential file stage that reads files using a file pattern and loads in to a table.
The job design:
seq stage --> transformer --> Target Table

This job is running file with few files (I tested with 8 files) qualified for the file pattern.
But, when I have 500 files, the job log says : 'Couldn't find any files on host dssrv001 with pattern /inputfiledir/Datafile*'.

But when I moved 492 files to a different directory, I am able to process remaining 8 files successfully.
After I move back the 492 files to the directory I am trying to process from (total of 500 files) then getting same message that couldn't find files.
Just wondering is there a limit to process number of files using file pattern in sequential file state? If the limit exists it should say some appropriate message?

When I see in unix I see all the files exist.

Any help with this is highly appreciated.

Thank you!

Re: Reading files using file pattern -Is there a max limit?

Posted: Fri Feb 26, 2010 5:18 pm
by Chandrathdsx
One obeservation though:
in the file pattern: /inputfiledir/Datafile*
8 files are:
Datafile_1_1 format
remaining 492 files are as Datafile_1234_1234 format
If I keep these 492 files or 8 file only in the /inputfiledir, then I am able to read. But when I have all the 500 files, getting 'file not found' message.

8 files Datafile_1_1*
492 files Datafile_1234_1234*
But when I do a pattern on Datafile* in datastage I expect to process all 500 files. The file layout is same for all the 500 files.

Thanks.

Posted: Fri Feb 26, 2010 5:29 pm
by ray.wurlod
Try this UNIX command and see it you run into a problem.

Code: Select all

for file in `ls -1 Datafile*`
do
   echo $file
done
There may be an operating system limit on the size of a list.

Posted: Sun Feb 28, 2010 9:18 pm
by Chandrathdsx
ray.wurlod wrote:Try this UNIX command and see it you run into a problem.

Code: Select all

for file in `ls -1 Datafile*`
do
   echo $file
done
There may be an operating system limit on the size of a list. ...
I am able to see all files in OS (unix) using ls -l Datafile*. But when do that in Datastage using file pattern in sequential file stage, it does not find a file. The operating system does not seem to have a limitation as I am seeing the files with OS commands. Is there a limitation in datastage to read number of files using file pattern? If so, at least should it say some valid message like 'file limit exceeded or so'. Why it says no file found.

Any help with this highly appreciated.

Thanks.

Posted: Sun Feb 28, 2010 11:51 pm
by ray.wurlod
I do not believe there is a limit. Please try the small amount of script I suggested, not just an ls command.

Posted: Mon Mar 01, 2010 7:58 am
by chulett
Very few operating systems have no limit, most have an "arg max" config variable that sets where you get the dreaded "arg list too long" error from the globbing that goes on. There are workarounds if you were doing this manually, however since it's hard to say if that's the issue or not since you get "no files found" rather than any kind of an error, but then the stage could be masking that.

If this doesn't work out for you, I'd suggest pinging your official support vendor. BTW, what flavor of UNIX are you running? That's always good to mention as there are many and each have their own unique... characteristics. :wink:

Posted: Mon Mar 01, 2010 8:58 am
by anbu
You can concatenate all your files /inputfiledir/Datafile* to a single file in your before job subroutine. Use concatenated file in your sequential stage.

Code: Select all

cat /inputfiledir/Datafile* > /inputfiledir/NewFile

Posted: Mon Mar 01, 2010 2:41 pm
by Chandrathdsx
anbu wrote:You can concatenate all your files /inputfiledir/Datafile* to a single file in your before job subroutine. Use concatenated file in your sequential stage.

Code: Select all

cat /inputfiledir/Datafile* > /inputfiledir/NewFile
Anu,
Does it cause processing time to merge files into one?

Craig,
The unix OS is: AIX 5.3, I get 'arg list too long' in OS commands only when I have too many files (may be about 1500 or so.].
But, this is in read file from datastage 'sequential file stage'. Is there a separate limit for datastage to read file pattern fron Sequential file stage'. I am able to see the files correctly in OS commands. But, the sequential file stage is not working number of file matching is above certain #x. And these 'x' number of files still I am able to see directly in OS with commands like ls <file pattern>,

Thank you all for your valuable inputs, it helped me alot!

Posted: Mon Mar 01, 2010 2:47 pm
by chulett
Well, "too many" is a bit of a red herring as it is all about how much space their names take up rather than strictly the number of files.

We're not privy to the inner workings of the stage, hence the suggestion to take this to support.

Posted: Mon Mar 01, 2010 2:47 pm
by anbu
It will take some processing time to merge files into one and processing time depends on the size of the files.

Posted: Mon Mar 01, 2010 3:49 pm
by ray.wurlod
Then don't bother. Use a Filter command and a single File property.

Posted: Tue Mar 02, 2010 1:35 pm
by anbu
ray.wurlod wrote:Then don't bother. Use a Filter command and a single File property.
Can you please explain how can we use filter command to pick up all the files?

Posted: Mon Mar 22, 2010 12:22 pm
by Chandrathdsx
chulett wrote:Well, "too many" is a bit of a red herring as it is all about how much space their names take up rather than strictly the number of files.

We're not privy to the inner workings of the stage, hence the suggestion to take this to support.
Just interested to know if there is a limitation of total file(s) name(s) length for the 'import operator' of squential file stage when we read files using file pattern witha wild card? We are able to figure out the unix system limitation when using commands ls/mv etc.. But trying to understang how the import operator works to read data from sequential file. does it use 'cat' command internally?

Thank you all!

Posted: Mon Mar 22, 2010 1:32 pm
by asorrell
Nobody but engineering would have a definitive response on how the stage works, but you can get around the AIX "too many file" limitations by using a series of wildcard commands, each of which would return a smaller number of files, but as a group would cover all possible files.

You might also want to play around with using a shellscript to cat the files to standard out, and using an "external source" stage since you are on release 8.

Posted: Tue Apr 13, 2010 8:00 am
by myukassign
I had a similer issue and I used the following method to fix it.

I used an external file stage and pass the program as

cat /mydir/datafile* and defined my table defnition as I expect from the fiel and it worked.