Page 1 of 2

Getting part of a filename

Posted: Wed May 06, 2009 5:55 am
by karthi_gana
Hi All,

I have 6 files.

say for example

FileName: 562009_cons_file.txt
Content:
Name Age Salary
aa 35 80000
bb 42 95000

Now i want to get the name 'cons' from the file name and need to store it in the table.

Table Structure:
create table emp
(
Name varchar(25),
Age int,
Salary int,
Platform varchar(25)
)

If you look at the file content, we don't have the value for Platform. We need to get this value from the filename.


if the file name looks like

FileName: 562009_uma_file.txt
Content:
Name Age Salary
aa 35 90000
bb 27 55000

then i have to get 'uma' from the filename and then need to insert it into the table.

Final Output:

Select * from emp will give the below output.

aa 35 80000 cons
bb 42 95000 cons
aa 35 90000 uma
bb 27 55000 uma

Inputs are welcome!

Posted: Wed May 06, 2009 6:01 am
by Sainath.Srinivasan
What have you tried ?

Posted: Wed May 06, 2009 6:14 am
by nagarjuna
Hi ,

I hope you have MKSTOOL kit installled on your machine .So , you can use some unix commands to get that part from filename and pass it as a parameter to the job which is loading data into DB table .

Posted: Wed May 06, 2009 6:27 am
by nirdesh2
You can use 'File Name Column' option to get the filename with the file data.

Posted: Wed May 06, 2009 6:41 am
by chulett
Not in a Server job. You can easily use Field() to get what you want from the filename, the trick will be getting it. How do you process multiple files, one at a time?

Posted: Wed May 06, 2009 6:43 am
by sbass1
The way I might approach this is a script which gets a filename as an argument, then echos each line of that file to the console, appending the filename to the beginning or end of the line. You set that script as a filter for a sequential file stage, then use parsing functions in DS (eg. the Field function) to parse the filename to get the portion you want to load in the database.

This is much easier if you use Perl, vbscript, or PowerShell, but here is a rough .BAT file that might get you started:

Code: Select all

@echo off
set f=%1%
for /F "delims=~" %%l in (%f%) do echo %f% %%l
If I name this file ListLines.bat, then call it as ListLines.bat ListLines.bat, my output is:

Code: Select all

C:\My Batch Scripts>ListLines.bat ListLines.bat
ListLines.bat @echo off
ListLines.bat set f=%1%
ListLines.bat for /F "delims=
Oops on the last line...make sure you pick a delimiter that is NOT in your data. Again, this is easier in say Perl (probably faster too). If I were using Perl I might just parse the filename token in the script instead of DS.

Read "help for" in a cmd window for more details.

HTH,
Scott

Posted: Wed May 06, 2009 7:03 am
by chulett
That was going to be the next suggestion. :wink:

Done that before on UNIX but had no idea how to do it in a batch file and was poking around online trying to stimulate my old brain cells. Thanks for saving them from an early death.

Posted: Wed May 06, 2009 10:37 am
by karthi_gana
I have six files in a folder. But it may increase in the future.

ap_06052009_cons.txt
ap_05052009_cons.txt
ap_04052009_cons.txt
ap_06052009_uma.txt
ap_05052009_uma.txt
ap_04052009_uma.txt


so i used FOLDER stage to concatenate all the files into a single file.

Because all files has the same meta data. I need to insert all thos file content into a single table.

Folder Stage --> Transform --> Sequential File (It will contain all the values)

I created a job like above.

FOLDER Stage:

Output tab --> Properties --> Wildcard = ap*.txt
Column tab --> I have created two columns. 1) FileName 2) Content

Sequential Stage:

Input tab --> Genaral --> #loadfiledirectory#ap_all_files.txt ( merged result will be stored in this file)

As i said earlier the table doesn't have a column called 'Platform'.

so i need to get the Platform name form the filename itself.

Where i have to use this Field()?
How to implement .BAT file logic here?

Posted: Wed May 06, 2009 4:32 pm
by ray.wurlod
Use a Folder stage to read the file. This way you can get the file name. Then you can use a Row Splitter stage to get the separate records.

Posted: Wed May 06, 2009 4:47 pm
by chulett
ps. The Folder stage does not 'concatenate all files to a single file'. It does, however, get you the entire file contents in a single row. One row per file, in essence, hence the need for the Row Splitter.

Posted: Wed May 06, 2009 10:12 pm
by chulett
Use the Field() function downstream of the Folder stage in a transformer. Think of your filename field as an "underscore delimited string" and then ask Field() to get you the 3rd field from the string. Then either substring off everything in the result before the dot or just the first three characters if you know it is always three.

Posted: Wed May 06, 2009 10:21 pm
by Kryt0n
Unless the filename includes the full path (as Parallel does), in which case delimit on '/' first to get the actual filename then delimit on '_'

Posted: Wed May 06, 2009 10:31 pm
by chulett
True... from what I recall there is an option in the Folder stage to return either the full pathname or just the filename. Can't check right now.

Posted: Wed May 06, 2009 11:06 pm
by ray.wurlod
Folder stage gives the option of fully qualified pathname or not. Choose not. Then, for example, apply an "MCN" Oconv() to the file name to get the numeric portion.

Posted: Wed May 06, 2009 11:13 pm
by chulett
8) Except the numbers aren't what they need from the filename.