Hello:
We receive daily file on our FTP server and I load it in the table fine. Currently, its a small (pipe delimited) file (30 rows) with the following naming convention: EXTRACT_CTCC_032216.txt (EXTRACT_CTCC_<TodaysDate>.txt)
In Datastage, I created a batch and job.
Batch: I manipulate the file name by using system date and passing it as a parameter.
Job: I use FTP Plug in stage to read the above file (passed in as parameter) to load it in the table.
This works fine.
But things are changing and source system will be uploading the file to our FTP server hourly or more (multiple times in an hour).
So, the new file names will have date and (dynamic) time: EXTRACT_CTCC_032216_0915.txt
EXTRACT_CTCC_032216_0952.txt
EXTRACT_CTCC_032216.1029.txt
Question: How do I handle loading of these multiple files. The FTP Plug-in does not accept wild (*) character. I plan to schedule the batch to run hourly.
Any help is appreciated.
Thanks much,
-Sam
Reading multiple files (& load it in oracle table)
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 32
- Joined: Mon Apr 12, 2010 3:42 pm
- Location: Minneapolis
- Contact:
Many different ways to handle this... Here's one.
1) Instead of scheduling hourly, I'd recommend using a File Watcher in your scheduler software to check for the presence of a file. Most corporate schedulers can check using wildcards. When it finds one or more matching jobs, it kicks off a DataStage Job Sequence. (However - this entire step is optional).
2) The Job Sequence uses an execute command stage with a wildcard to return the list of files that are out there waiting.
3) Use a Sequence Loop (StartLoop / EndLoop) set to process a "list" as the counter, and use the list of files for your list.
4) The loop will process for each file, storing the individual file name in the counter, which can then be passed to the FTP job as a parameter.
5) After processing, archive the file to a different location to prevent re-processing it.
1) Instead of scheduling hourly, I'd recommend using a File Watcher in your scheduler software to check for the presence of a file. Most corporate schedulers can check using wildcards. When it finds one or more matching jobs, it kicks off a DataStage Job Sequence. (However - this entire step is optional).
2) The Job Sequence uses an execute command stage with a wildcard to return the list of files that are out there waiting.
3) Use a Sequence Loop (StartLoop / EndLoop) set to process a "list" as the counter, and use the list of files for your list.
4) The loop will process for each file, storing the individual file name in the counter, which can then be passed to the FTP job as a parameter.
5) After processing, archive the file to a different location to prevent re-processing it.
Another option is to ditch the FTP stage and switch over to a script that does the "mget" for you. Once the files are on the ETL server, your job can then process them using a wildcard pattern in the Sequential File stage, loading them all at once rather than one by one.
Possible downside? That would mean you'd be storing the files on the ETL server, at least temporarily until they are loaded, or for some measure of forever if you archived them there as well. Something you may not want (or be allowed) to do.
If you want to leave your main processing job untouched and continue to not bring the files themselves over, you could leverage the looping construct / wrapper that Andy mentioned.
Possible downside? That would mean you'd be storing the files on the ETL server, at least temporarily until they are loaded, or for some measure of forever if you archived them there as well. Something you may not want (or be allowed) to do.
If you want to leave your main processing job untouched and continue to not bring the files themselves over, you could leverage the looping construct / wrapper that Andy mentioned.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 32
- Joined: Mon Apr 12, 2010 3:42 pm
- Location: Minneapolis
- Contact:
Friendly question: why are you still using FTP plug-in? FTP Enterprise handles multiple files in the stage properties.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872