Reading files using file pattern -Is there a max limit?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Chandrathdsx
Participant
Posts: 59
Joined: Sat Jul 05, 2008 11:32 am

Reading files using file pattern -Is there a max limit?

Post by Chandrathdsx »

I have job with a sequential file stage that reads files using a file pattern and loads in to a table.
The job design:
seq stage --> transformer --> Target Table

This job is running file with few files (I tested with 8 files) qualified for the file pattern.
But, when I have 500 files, the job log says : 'Couldn't find any files on host dssrv001 with pattern /inputfiledir/Datafile*'.

But when I moved 492 files to a different directory, I am able to process remaining 8 files successfully.
After I move back the 492 files to the directory I am trying to process from (total of 500 files) then getting same message that couldn't find files.
Just wondering is there a limit to process number of files using file pattern in sequential file state? If the limit exists it should say some appropriate message?

When I see in unix I see all the files exist.

Any help with this is highly appreciated.

Thank you!
Chandrathdsx
Participant
Posts: 59
Joined: Sat Jul 05, 2008 11:32 am

Re: Reading files using file pattern -Is there a max limit?

Post by Chandrathdsx »

One obeservation though:
in the file pattern: /inputfiledir/Datafile*
8 files are:
Datafile_1_1 format
remaining 492 files are as Datafile_1234_1234 format
If I keep these 492 files or 8 file only in the /inputfiledir, then I am able to read. But when I have all the 500 files, getting 'file not found' message.

8 files Datafile_1_1*
492 files Datafile_1234_1234*
But when I do a pattern on Datafile* in datastage I expect to process all 500 files. The file layout is same for all the 500 files.

Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try this UNIX command and see it you run into a problem.

Code: Select all

for file in `ls -1 Datafile*`
do
   echo $file
done
There may be an operating system limit on the size of a list.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Chandrathdsx
Participant
Posts: 59
Joined: Sat Jul 05, 2008 11:32 am

Post by Chandrathdsx »

ray.wurlod wrote:Try this UNIX command and see it you run into a problem.

Code: Select all

for file in `ls -1 Datafile*`
do
   echo $file
done
There may be an operating system limit on the size of a list. ...
I am able to see all files in OS (unix) using ls -l Datafile*. But when do that in Datastage using file pattern in sequential file stage, it does not find a file. The operating system does not seem to have a limitation as I am seeing the files with OS commands. Is there a limitation in datastage to read number of files using file pattern? If so, at least should it say some valid message like 'file limit exceeded or so'. Why it says no file found.

Any help with this highly appreciated.

Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I do not believe there is a limit. Please try the small amount of script I suggested, not just an ls command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Very few operating systems have no limit, most have an "arg max" config variable that sets where you get the dreaded "arg list too long" error from the globbing that goes on. There are workarounds if you were doing this manually, however since it's hard to say if that's the issue or not since you get "no files found" rather than any kind of an error, but then the stage could be masking that.

If this doesn't work out for you, I'd suggest pinging your official support vendor. BTW, what flavor of UNIX are you running? That's always good to mention as there are many and each have their own unique... characteristics. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

You can concatenate all your files /inputfiledir/Datafile* to a single file in your before job subroutine. Use concatenated file in your sequential stage.

Code: Select all

cat /inputfiledir/Datafile* > /inputfiledir/NewFile
You are the creator of your destiny - Swami Vivekananda
Chandrathdsx
Participant
Posts: 59
Joined: Sat Jul 05, 2008 11:32 am

Post by Chandrathdsx »

anbu wrote:You can concatenate all your files /inputfiledir/Datafile* to a single file in your before job subroutine. Use concatenated file in your sequential stage.

Code: Select all

cat /inputfiledir/Datafile* > /inputfiledir/NewFile
Anu,
Does it cause processing time to merge files into one?

Craig,
The unix OS is: AIX 5.3, I get 'arg list too long' in OS commands only when I have too many files (may be about 1500 or so.].
But, this is in read file from datastage 'sequential file stage'. Is there a separate limit for datastage to read file pattern fron Sequential file stage'. I am able to see the files correctly in OS commands. But, the sequential file stage is not working number of file matching is above certain #x. And these 'x' number of files still I am able to see directly in OS with commands like ls <file pattern>,

Thank you all for your valuable inputs, it helped me alot!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well, "too many" is a bit of a red herring as it is all about how much space their names take up rather than strictly the number of files.

We're not privy to the inner workings of the stage, hence the suggestion to take this to support.
-craig

"You can never have too many knives" -- Logan Nine Fingers
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

It will take some processing time to merge files into one and processing time depends on the size of the files.
You are the creator of your destiny - Swami Vivekananda
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Then don't bother. Use a Filter command and a single File property.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

ray.wurlod wrote:Then don't bother. Use a Filter command and a single File property.
Can you please explain how can we use filter command to pick up all the files?
You are the creator of your destiny - Swami Vivekananda
Chandrathdsx
Participant
Posts: 59
Joined: Sat Jul 05, 2008 11:32 am

Post by Chandrathdsx »

chulett wrote:Well, "too many" is a bit of a red herring as it is all about how much space their names take up rather than strictly the number of files.

We're not privy to the inner workings of the stage, hence the suggestion to take this to support.
Just interested to know if there is a limitation of total file(s) name(s) length for the 'import operator' of squential file stage when we read files using file pattern witha wild card? We are able to figure out the unix system limitation when using commands ls/mv etc.. But trying to understang how the import operator works to read data from sequential file. does it use 'cat' command internally?

Thank you all!
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Nobody but engineering would have a definitive response on how the stage works, but you can get around the AIX "too many file" limitations by using a series of wildcard commands, each of which would return a smaller number of files, but as a group would cover all possible files.

You might also want to play around with using a shellscript to cat the files to standard out, and using an "external source" stage since you are on release 8.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

Post by myukassign »

I had a similer issue and I used the following method to fix it.

I used an external file stage and pass the program as

cat /mydir/datafile* and defined my table defnition as I expect from the fiel and it worked.
Post Reply