Getting part of a filename
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 729
- Joined: Tue Apr 28, 2009 10:49 pm
Getting part of a filename
Hi All,
I have 6 files.
say for example
FileName: 562009_cons_file.txt
Content:
Name Age Salary
aa 35 80000
bb 42 95000
Now i want to get the name 'cons' from the file name and need to store it in the table.
Table Structure:
create table emp
(
Name varchar(25),
Age int,
Salary int,
Platform varchar(25)
)
If you look at the file content, we don't have the value for Platform. We need to get this value from the filename.
if the file name looks like
FileName: 562009_uma_file.txt
Content:
Name Age Salary
aa 35 90000
bb 27 55000
then i have to get 'uma' from the filename and then need to insert it into the table.
Final Output:
Select * from emp will give the below output.
aa 35 80000 cons
bb 42 95000 cons
aa 35 90000 uma
bb 27 55000 uma
Inputs are welcome!
I have 6 files.
say for example
FileName: 562009_cons_file.txt
Content:
Name Age Salary
aa 35 80000
bb 42 95000
Now i want to get the name 'cons' from the file name and need to store it in the table.
Table Structure:
create table emp
(
Name varchar(25),
Age int,
Salary int,
Platform varchar(25)
)
If you look at the file content, we don't have the value for Platform. We need to get this value from the filename.
if the file name looks like
FileName: 562009_uma_file.txt
Content:
Name Age Salary
aa 35 90000
bb 27 55000
then i have to get 'uma' from the filename and then need to insert it into the table.
Final Output:
Select * from emp will give the below output.
aa 35 80000 cons
bb 42 95000 cons
aa 35 90000 uma
bb 27 55000 uma
Inputs are welcome!
Karthik
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
The way I might approach this is a script which gets a filename as an argument, then echos each line of that file to the console, appending the filename to the beginning or end of the line. You set that script as a filter for a sequential file stage, then use parsing functions in DS (eg. the Field function) to parse the filename to get the portion you want to load in the database.
This is much easier if you use Perl, vbscript, or PowerShell, but here is a rough .BAT file that might get you started:
If I name this file ListLines.bat, then call it as ListLines.bat ListLines.bat, my output is:
Oops on the last line...make sure you pick a delimiter that is NOT in your data. Again, this is easier in say Perl (probably faster too). If I were using Perl I might just parse the filename token in the script instead of DS.
Read "help for" in a cmd window for more details.
HTH,
Scott
This is much easier if you use Perl, vbscript, or PowerShell, but here is a rough .BAT file that might get you started:
Code: Select all
@echo off
set f=%1%
for /F "delims=~" %%l in (%f%) do echo %f% %%l
Code: Select all
C:\My Batch Scripts>ListLines.bat ListLines.bat
ListLines.bat @echo off
ListLines.bat set f=%1%
ListLines.bat for /F "delims=
Read "help for" in a cmd window for more details.
HTH,
Scott
Some people are the Michael Jordan's of Datastage. I'm more like Muggsy Bogues
-
- Premium Member
- Posts: 729
- Joined: Tue Apr 28, 2009 10:49 pm
I have six files in a folder. But it may increase in the future.
ap_06052009_cons.txt
ap_05052009_cons.txt
ap_04052009_cons.txt
ap_06052009_uma.txt
ap_05052009_uma.txt
ap_04052009_uma.txt
so i used FOLDER stage to concatenate all the files into a single file.
Because all files has the same meta data. I need to insert all thos file content into a single table.
Folder Stage --> Transform --> Sequential File (It will contain all the values)
I created a job like above.
FOLDER Stage:
Output tab --> Properties --> Wildcard = ap*.txt
Column tab --> I have created two columns. 1) FileName 2) Content
Sequential Stage:
Input tab --> Genaral --> #loadfiledirectory#ap_all_files.txt ( merged result will be stored in this file)
As i said earlier the table doesn't have a column called 'Platform'.
so i need to get the Platform name form the filename itself.
Where i have to use this Field()?
How to implement .BAT file logic here?
ap_06052009_cons.txt
ap_05052009_cons.txt
ap_04052009_cons.txt
ap_06052009_uma.txt
ap_05052009_uma.txt
ap_04052009_uma.txt
so i used FOLDER stage to concatenate all the files into a single file.
Because all files has the same meta data. I need to insert all thos file content into a single table.
Folder Stage --> Transform --> Sequential File (It will contain all the values)
I created a job like above.
FOLDER Stage:
Output tab --> Properties --> Wildcard = ap*.txt
Column tab --> I have created two columns. 1) FileName 2) Content
Sequential Stage:
Input tab --> Genaral --> #loadfiledirectory#ap_all_files.txt ( merged result will be stored in this file)
As i said earlier the table doesn't have a column called 'Platform'.
so i need to get the Platform name form the filename itself.
Where i have to use this Field()?
How to implement .BAT file logic here?
Karthik
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Use the Field() function downstream of the Folder stage in a transformer. Think of your filename field as an "underscore delimited string" and then ask Field() to get you the 3rd field from the string. Then either substring off everything in the result before the dot or just the first three characters if you know it is always three.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Folder stage gives the option of fully qualified pathname or not. Choose not. Then, for example, apply an "MCN" Oconv() to the file name to get the numeric portion.
Last edited by ray.wurlod on Wed May 06, 2009 11:18 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.