Reading Multiple files using File Pattern

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dspradeep
Participant
Posts: 59
Joined: Fri Aug 21, 2009 12:58 am

Reading Multiple files using File Pattern

Post by dspradeep »

I have created simple file name called outputA.txt which contain data

Code: Select all

name,desc
x,manager
y,clerk
z,security
again make copy of this file and changed the name value as p,q,r and rename the file name as outputB.txt

Code: Select all

name,desc
p,manager
q,clerk
r,security
In the sequntial file stage, when I am trying to use file pattern to view the data I coundn't. After that I have searched this forum and find some solution like we need to change this Env var value as true

Code: Select all

$APT_IMPORT_PATTERN_USES_FILESET = True 
I did this, after that also i couldn't view. the below error i am getting when i try to view

Code: Select all

##W TOIX 000000 23:51:01(000) <main_program> createFilesetFromPattern(): Couldn't find any files on host  with pattern D:/Datastage/DX444/ISFiles/Temp/*.txt. [new-impexp\file_import.C:1647]
>##E TOIX 000138 23:51:01(002) <Sequential_File_0> At least one filename or data source must be set in APT_FileImportOperator before use. [new-impexp\file_import.C:2029]
>##E TFSR 000019 23:51:01(006) <main_program> Could not check all operators because of previous error(s) [api\step_rep.C:1128]
 ##I TCOS 000022 23:51:01(007) <main_program> Explanation:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You don't need File Pattern to read one file. (The file name could be a job parameter.) However, it will work to use File Pattern, provided that the pattern matches one or more file names. It would appear that your pattern does not (or lacks a directory component to the path). Please advise the precise file pattern value that you are using.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That $APT variable helps when you use the 'give me the filename that was matched' option. Right now, it would seem that your wildcard pattern doesn't match any filenames.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dspradeep
Participant
Posts: 59
Joined: Fri Aug 21, 2009 12:58 am

Post by dspradeep »

I agree whether one or two file doesn't matter for file pattern so it seems the file pattern option is not working. why it's not working whether any $APT file settings need to change or something else.........

Please help me
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Be more specific about what "not working" means. I have no problem with it. Tell us what your property settings are, what output you expect to get, what the file names actually are, and what output you are actually getting.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dspradeep
Participant
Posts: 59
Joined: Fri Aug 21, 2009 12:58 am

Post by dspradeep »

There are two files are available in the below location
D:\Datastage\DX444\ISFiles\Temp
File name : outputA.txt,outputB.txt
File Structure is same for both files. The file content also I given in my first post.
I want to load this both data into another sequential file using file patern option so that I have given below properties in sequential file and trying to view the data but i couldn't

File Pattern:D:\Datastage\DX444\ISFiles\Temp\output*.txt
Read Method : FilePattern

I have loaded Imported the metadata in column tab

trying to view data using ViewData button

Code: Select all

##I TFCN 000001 15:23:27(000) <main_program> 
 Ascential DataStage(tm) Enterprise Edition 7.5
 Copyright (c) 2004, 1997-2004 Ascential Software Corporation.
 All Rights Reserved
 
 
 ##I TOSH 000002 15:23:27(001) <main_program> orchgeneral: loaded
 ##I TOSH 000002 15:23:27(002) <main_program> orchsort: loaded
 ##I TOSH 000002 15:23:27(003) <main_program> orchstats: loaded
 ##W TCOS 000049 15:23:27(004) <main_program> Parameter specified but not used in flow: DSProjectMapName [osl\osl.C:651]
 ##I TCOS 000021 15:23:27(005) <main_program> Echo:
 import
 -schema record
   {final_delim=end, delim=','}
 (
   name:string[max=255];
   desc:string[max=255];
 )
 -rejects continue
 -reportProgress yes
 -filepattern 'D:\\Datastage\\DX444\\ISFiles\\Temp\\output*.txt'
 
 [ident('Sequential_File_0'); jobmon_ident('Sequential_File_0')]
 0> [] 'Sequential_File_0:DSLink2.v'
 ;
 
 head -nrecs 10
 [ident('_Head'); jobmon_ident('_Head'); seq]
 0< 'Sequential_File_0:DSLink2.v'
 0> [] 'Sequential_File_0:DSLink2_Head.v'
 ;
 
 peek -all -delim '%%PEEK%%DELIM%%' -field name -field desc
 [ident('_PEEK_IDENT_'); jobmon_ident('_PEEK_IDENT_'); seq]
 0< 'Sequential_File_0:DSLink2_Head.v'
 0> [] 'Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v'
 ;
 
 abort -nrecs 10
 [ident('_ABORT_IDENT_'); jobmon_ident('_ABORT_IDENT_'); seq]
 0< 'Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v'
 ;
 
 
 
 ##I TFSC 000001 15:23:28(004) <main_program> APT configuration file: C:/Ascential/DataStage/Configurations/default.apt
 ##I TFSC 000000 15:23:28(005) <main_program> 
 This step has no datasets.
 
 It has 1 operator:
 op0[1p] {(sequential APT_CombinedOperatorController:
       (Sequential_File_0)
       (_Head)
       (_PEEK_IDENT_)
       (_ABORT_IDENT_)
     ) on nodes (
       node1[op0,p0]
     )}
 It runs 1 process on 1 node.
 ##I TCOS 000022 15:23:28(006) <main_program> Explanation:
 Step has 4 operators.
 ???, invoked with args: -schema record {final_delim=end, delim=","} ( name: string[max=255];  desc: string[max=255]; ) -rejects continue -reportProgress yes -filepattern D:\Datastage\DX444\ISFiles\Temp\output*.txt 
     output port 0 bound to data entity "Sequential_File_0:DSLink2.v"
 
 ???, invoked with args: -nrecs 10 
     input port 0 bound to data entity "Sequential_File_0:DSLink2.v"
     output port 0 bound to data entity "Sequential_File_0:DSLink2_Head.v"
 
 ???, invoked with args: -all -delim %%PEEK%%DELIM%% -field name -field desc 
     input port 0 bound to data entity "Sequential_File_0:DSLink2_Head.v"
     output port 0 bound to data entity "Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v"
 
 ???, invoked with args: -nrecs 10 
     input port 0 bound to data entity "Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v"
 
 Step has 3 data entities.
   Data "Sequential_File_0:DSLink2.v"(an Orchestrate data set)
     written by operator "Sequential_File_0"
     read by operator "_Head"
 
   Data "Sequential_File_0:DSLink2_Head.v"(an Orchestrate data set)
     written by operator "_Head"
     read by operator "_PEEK_IDENT_"
 
   Data "Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v"(an Orchestrate data set)
     written by operator "_PEEK_IDENT_"
     read by operator "_ABORT_IDENT_"
 
 
 ##I TCOS 000023 15:23:28(007) <main_program> Dump:
 { 
   text="import\r\n-schema record {final_delim=end, delim=\",\"} ( name: string[max=255];  desc: string[max=255]; )\r\n-rejects continue\r\n-reportProgress yes\r\n-filepattern 'D:\\\\Datastage\\\\DX444\\\\ISFiles\\\\Temp\\\\output*.txt'\r\n\r\n[ident('Sequential_File_0'); jobmon_ident('Sequential_File_0')]\r\n0> [] 'Sequential_File_0:DSLink2.v'\r\n;\r\n\r\nhead -nrecs 10\r\n[ident('_Head'); jobmon_ident('_Head'); seq]\r\n0< 'Sequential_File_0:DSLink2.v'\r\n0> [] 'Sequential_File_0:DSLink2_Head.v'\r\n;\r\n\r\npeek -all -delim '%%PEEK%%DELIM%%' -field name -field desc\r\n[ident('_PEEK_IDENT_'); jobmon_ident('_PEEK_IDENT_'); seq]\r\n0< 'Sequential_File_0:DSLink2_Head.v'\r\n0> [] 'Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v'\r\n;\r\n\r\nabort -nrecs 10\r\n[ident('_ABORT_IDENT_'); jobmon_ident('_ABORT_IDENT_'); seq]\r\n0< 'Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v'\r\n;", 
   line=1, column=1, name="", qualname="", 
   op={ 
        text="import\r\n-schema record {final_delim=end, delim=\",\"} ( name: string[max=255];  desc: string[max=255]; )\r\n-rejects continue\r\n-reportProgress yes\r\n-filepattern 'D:\\\\Datastage\\\\DX444\\\\ISFiles\\\\Temp\\\\output*.txt'\r\n\r\n[ident('Sequential_File_0'); jobmon_ident('Sequential_File_0')]\r\n0> [] 'Sequential_File_0:DSLink2.v'", 
        line=1, column=1, name=import, qualname=Sequential_File_0, 
        wrapout={},
        wrapperfile=import, kind=non_wrapper_cdi_op, exec_mode=none, 
        args="'record {final_delim=end, delim=\",\"} ( name: string[max=255];  desc: string[max=255]; )'-rejects'continue'-reportProgress'yes'-filepattern'D:\\\\Datastage\\\\DX444\\\\ISFiles\\\\Temp\\\\output*.txt'", 
        output={ text="\r\n0> [] 'Sequential_File_0:DSLink2.v'", line=13, 
                 column=1, name="", qualname="Sequential_File_0[o0]", 
                 data="Sequential_File_0:DSLink2.v"
               }
      },
   op={ 
        text="\r\n\r\nhead -nrecs 10\r\n[ident('_Head'); jobmon_ident('_Head'); seq]\r\n0< 'Sequential_File_0:DSLink2.v'\r\n0> [] 'Sequential_File_0:DSLink2_Head.v'", 
        line=16, column=1, name=head, qualname=_Head, 
        wrapout={},
        wrapperfile=head, kind=non_wrapper_cdi_op, exec_mode=seq, 
        args="'10'", 
        input={ text="\r\n0< 'Sequential_File_0:DSLink2.v'", line=18, 
                column=1, name="", qualname="_Head[i0]", 
                data="Sequential_File_0:DSLink2.v"
              },
        output={ text="\r\n0> [] 'Sequential_File_0:DSLink2_Head.v'", 
                 line=19, column=1, name="", qualname="_Head[o0]", 
                 data="Sequential_File_0:DSLink2_Head.v"
               }
      },
   op={ 
        text="\r\n\r\npeek -all -delim '%%PEEK%%DELIM%%' -field name -field desc\r\n[ident('_PEEK_IDENT_'); jobmon_ident('_PEEK_IDENT_'); seq]\r\n0< 'Sequential_File_0:DSLink2_Head.v'\r\n0> [] 'Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v'", 
        line=22, column=1, name=peek, qualname=_PEEK_IDENT_, 
        wrapout={},
        wrapperfile=peek, kind=non_wrapper_cdi_op, exec_mode=seq, 
        args="'-delim'%%PEEK%%DELIM%%'-field'name'-field'desc'", 
        input={ text="\r\n0< 'Sequential_File_0:DSLink2_Head.v'", line=24, 
                column=1, name="", qualname="_PEEK_IDENT_[i0]", 
                data="Sequential_File_0:DSLink2_Head.v"
              },
        output={ 
                 text="\r\n0> [] 'Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v'", 
                 line=25, column=1, name="", qualname="_PEEK_IDENT_[o0]", 
                 data="Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v"
               }
      },
   op={ 
        text="\r\n\r\nabort -nrecs 10\r\n[ident('_ABORT_IDENT_'); jobmon_ident('_ABORT_IDENT_'); seq]\r\n0< 'Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v'", 
        line=28, column=1, name=abort, qualname=_ABORT_IDENT_, 
        wrapout={},
        wrapperfile=abort, kind=non_wrapper_cdi_op, exec_mode=seq, 
        args="'10'", 
        input={ text="\r\n0< 'Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v'", 
                line=30, column=1, name="", qualname="_ABORT_IDENT_[i0]", 
                data="Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v"
              }
      },
   data={ text="\r\n0> [] 'Sequential_File_0:DSLink2.v'", line=13, column=1, 
          name="Sequential_File_0:DSLink2.v", 
          qualname="Sequential_File_0:DSLink2.v", 
          partwrapout={},
          collwrapout={},
          dir=flow, kind=ds, writer=Sequential_File_0, reader=_Head, pp=none, 
          trunc=default, ident="Sequential_File_0:DSLink2.v"
        },
   data={ text="\r\n0> [] 'Sequential_File_0:DSLink2_Head.v'", line=19, 
          column=1, name="Sequential_File_0:DSLink2_Head.v", 
          qualname="Sequential_File_0:DSLink2_Head.v", 
          partwrapout={},
          collwrapout={},
          dir=flow, kind=ds, writer=_Head, reader=_PEEK_IDENT_, pp=none, 
          trunc=default, ident="Sequential_File_0:DSLink2_Head.v"
        },
   data={ text="\r\n0> [] 'Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v'", 
          line=25, column=1, 
          name="Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v", 
          qualname="Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v", 
          partwrapout={},
          collwrapout={},
          dir=flow, kind=ds, writer=_PEEK_IDENT_, reader=_ABORT_IDENT_, 
          pp=none, trunc=default, 
          ident="Sequential_File_0:DSLink2_Head_PEEK_IDENT_.v"
        }
 }
 
 ##W TOIX 000000 15:23:29(000) <Sequential_File_0,0> Couldn't find any files on host PRADEEP-0320510 with pattern D:/Datastage/DX444/ISFiles/Temp/output*.txt. [new-impexp\file_import.C:2624]
 ##I TOIX 000000 15:23:29(001) <Sequential_File_0,0> Output 0 produced 0 records.
 ##I USER 000000 15:23:29(002) <_Head,0> Input 0 consumed 0 records.
 ##I USER 000000 15:23:29(003) <_Head,0> Output 0 produced 0 records.
 ##I TOPK 000000 15:23:29(004) <_PEEK_IDENT_,0> Input 0 consumed 0 records.
 ##I TOPK 000000 15:23:29(005) <_PEEK_IDENT_,0> Output 0 produced 0 records.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... out of all that, this is the only bit that really matters:

Code: Select all

 ##W TOIX 000000 15:23:29(000) <Sequential_File_0,0> Couldn't find any files on host PRADEEP-0320510 with pattern D:/Datastage/DX444/ISFiles/Temp/output*.txt. [new-impexp\file_import.C:2624] 
Silly question perhaps, but are these files on your DataStage server rather than your client PC? I can't tell for certain from your post. Can you successfully read one or both (add two 'File=' properties) using the Specific File(s) option? As a test, does it work if you change the "output*.txt" portion to simply "*.txt"?
-craig

"You can never have too many knives" -- Logan Nine Fingers
dspradeep
Participant
Posts: 59
Joined: Fri Aug 21, 2009 12:58 am

Post by dspradeep »

Can you successfully read one or both (add two 'File=' properties) using the Specific File(s) option?

yes both file i can able to view if I use the ReadMethod(s) as Specific File(s). But if I use ReadMethod(s) as File Pattern and given file name as
D:\Datastage\DX444\ISFiles\Temp\*.txt also not working.

does it work if you change the "output*.txt" portion to simply "*.txt"?
No
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

You want to see data of OutputA.txt OR OutputB.txt ?Can you tell me ?
I think going by your question you want to see the data of both the files using a single command. It does not work like that way.
The concept of 'file pattern' to load mutilple files with same pattern.
I don't think you can put a 'view data' using a 'pattern'. Datastage would
require to apply 'fuzzy logic' to understand this patttern and to apply to one of the file.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try these two techniques.
  • Use forward slashes in the pathname rather than backslashes.

    Use a UNIX-style pathname that does not include a drive letter.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sreenivasulu wrote:I don't think you can put a 'view data' using a 'pattern'. Datastage would require to apply 'fuzzy logic' to understand this patttern and to apply to one of the file.
Nothing fuzzy about it and of course you can.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dspradeep
Participant
Posts: 59
Joined: Fri Aug 21, 2009 12:58 am

Post by dspradeep »

I think you are right srini. I try to load both file into another sequential file using file pattern which is loaded the data perfectly. so we can't view the data using file pattern in sequential file stage.

I am facing one more problem now. i am having both file have field name (header record) but i need to load only one time into target. is there any option to drop header record apart from first file.

Code: Select all


name,desc
x,manager
y,clerk
z,security
name,desc
p,manager
q,clerk
r,security


ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No. You have to design that piece, perhaps by looking for the explicit column header values (since your columns are all defined as sufficiently large VarChar to handle the column headings, no row will be rejected).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

dspradeep wrote:I think you are right srini. I try to load both file into another sequential file using file pattern which is loaded the data perfectly. so we can't view the data using file pattern in sequential file stage.
As already noted, this is completely incorrect. You can. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

It was my logical assumption and it might not be correct. Thanks craig for the clarification.
Post Reply