Problem with folder stage?
Moderators: chulett, rschirm, roy
Problem with folder stage?
Not quite sure how to explain this but here goes:
I use a folder stage to identify all the files in a specific directory folder that are to be processed in the current run. I have used the same basic job for a number of other processes and it works fine. For this process the wildcard is *.DAT.
The problem I am having is that, in this instance, it doesn't pick up one large file (approx 340MB). If I have a much smaller file in the folder as well, the job will pick up that file to be processed, but not the large one.
I put two files in the directory: OD22003.DAT (340MB) and JOLTestFile.DAT (15KB).
If I run this job as a one-off and if I use the button to the right of the pathname parameter to build up the path I get the following pathname: D:\Data\Download\ETLPreProcLandG\JOL\Ready\ and the large file is selected.
If I use the parameter file that we use for all the other jobs in this project, I get a pathname of \\Apps17\Data\Download\ETLPreProcLandG\JOL\Ready\ which does not select the large file (but does select the small file), but this is what we use everywhere else and it works.
The job doesn't fail or anything - it just doesn't seem to like the large file.
I have no idea what the problem may be so any suggestions greatly appreciated.
Cheers
Johnno
I use a folder stage to identify all the files in a specific directory folder that are to be processed in the current run. I have used the same basic job for a number of other processes and it works fine. For this process the wildcard is *.DAT.
The problem I am having is that, in this instance, it doesn't pick up one large file (approx 340MB). If I have a much smaller file in the folder as well, the job will pick up that file to be processed, but not the large one.
I put two files in the directory: OD22003.DAT (340MB) and JOLTestFile.DAT (15KB).
If I run this job as a one-off and if I use the button to the right of the pathname parameter to build up the path I get the following pathname: D:\Data\Download\ETLPreProcLandG\JOL\Ready\ and the large file is selected.
If I use the parameter file that we use for all the other jobs in this project, I get a pathname of \\Apps17\Data\Download\ETLPreProcLandG\JOL\Ready\ which does not select the large file (but does select the small file), but this is what we use everywhere else and it works.
The job doesn't fail or anything - it just doesn't seem to like the large file.
I have no idea what the problem may be so any suggestions greatly appreciated.
Cheers
Johnno
In the folder stage the on the outputs/columns tab is:
Column name: FileName
Derivation: (none)
Key: Yes
SQL Type: VarChar
Length: 255
Nullable: No
In the sequential file stage that the data is being written to, the inputs/columns tab is:
Column name: FileName
Key: Yes
SQL Type: Char
Length: 64
Nullable: No
Thanks for the reply and I hope this helps
Johnno
Column name: FileName
Derivation: (none)
Key: Yes
SQL Type: VarChar
Length: 255
Nullable: No
In the sequential file stage that the data is being written to, the inputs/columns tab is:
Column name: FileName
Key: Yes
SQL Type: Char
Length: 64
Nullable: No
Thanks for the reply and I hope this helps
Johnno
Being on a Windows platform we could, as you say, use DOS commands to produce a file of filenames. My major concern is however, that as we have used this approach so many times in the past with this being the first problem, have we made a serious misjudgement in the design? And if not, why won't this approach work for this one file?
There was a discussion on ADN about the folder stage when someone had a very similar issue. Here is a direct link to the discussion for any members. The gist of the problem is this:
Apparently, the file size limit is significantly lower than 2GB. It looks like you are going to need to rewrite this using either a Batch File approach or something written in Job Control for files of this size.Ernie Ostic wrote:It may be worth noting that the Folder Stage was officially designed for XML and had its original debut when DataStage first started supporting XML 5 years ago. It's intent is to read a set of (typically XML) files in a subdirectory, sending each one as a complete "chunk" (single column for the ENTIRE contents) to the XMLInput Stage for parsing into individual colums for elements and attributes. I've used it for some reasonably large XML documents (50 to 60 meg), but as Ray noted, it certainly is going to blow all link/column memory availability if a file as large as 2G is trying to pass thru....
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Thanks very much. As much as I wish you could have told me different, I'm glad I found out the reason behind it as we will need to change a few of our jobs (and a whole 6 days before implementation!!!)
I will study the link you provided and look at it in some more detail tomorrow (train strike in London today so must nip off home!).
Thanks again.
I will study the link you provided and look at it in some more detail tomorrow (train strike in London today so must nip off home!).
Thanks again.
-
- Premium Member
- Posts: 892
- Joined: Thu Oct 16, 2003 5:18 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 892
- Joined: Thu Oct 16, 2003 5:18 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hi,
I faced this a while back, there are even posts here regarding this issue.
the problem is configuration that might be different between window servers.
I had this problem with over 200MB files in one machine and over 300MB in another.
since this stage has a size limit ( I don't know nor care what this limit is even if configurable ) and we have the potential to someday face a file bigger then the set limit I recommend to simply use other alternatives like DS basic which is more managable as a DS developer then running after the sys admin guys when you need to reconfigure this for any reason.
IHTH,
I faced this a while back, there are even posts here regarding this issue.
the problem is configuration that might be different between window servers.
I had this problem with over 200MB files in one machine and over 300MB in another.
since this stage has a size limit ( I don't know nor care what this limit is even if configurable ) and we have the potential to someday face a file bigger then the set limit I recommend to simply use other alternatives like DS basic which is more managable as a DS developer then running after the sys admin guys when you need to reconfigure this for any reason.
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Thanks for all your replies. I have finally got access to the Ascential developer net to check out the posts relating to this issue (thanks again chulett).
I will try and have a play with this and see how it works and what it can/can't really do (if I ever get any time!). For now I just use the DOS DIR command and read the file it produces.
Just a quick question though as I am still digesting the all this info: from what I've read on the Ascential developer net above, I think it was a comment from Ray, I get the impression that it is possible to pick up the file names only and therefore file size should not be an issue. Is this correct? If so, I would have thought the way I was doing it would be OK as when I check the output from this job (it just rights each row to a sequential file) all I see is the filename itself - no additional data!
Anyway, possibly not the most important thing now as we have the DOS command, but any suggestions/explanation would be welcome.
Cheers and thanks again.
Johnno
I will try and have a play with this and see how it works and what it can/can't really do (if I ever get any time!). For now I just use the DOS DIR command and read the file it produces.
Just a quick question though as I am still digesting the all this info: from what I've read on the Ascential developer net above, I think it was a comment from Ray, I get the impression that it is possible to pick up the file names only and therefore file size should not be an issue. Is this correct? If so, I would have thought the way I was doing it would be OK as when I check the output from this job (it just rights each row to a sequential file) all I see is the filename itself - no additional data!
Anyway, possibly not the most important thing now as we have the DOS command, but any suggestions/explanation would be welcome.
Cheers and thanks again.
Johnno