APT_IMPORT_PATTERN_USES_FILESET

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

APT_IMPORT_PATTERN_USES_FILESET

Post by bcarlson »

Problem:

When using APT_IMPORT_PATTERN_USES_FILESET and the sequential file stage with a file pattern, we are getting this error:

Code: Select all

Sequential_File_21: In file set "/u001/local/dstage_tmp/import_tmp_129925608b5f99d0.fs": Parsing a dataset, expecting a fileset.. [new-impexp/file_import.C:695]
Has anyone done this before or seen this error message?

Here is what we are doing:

We have a number of large imports where the data arrives as a set of files with identical layouts, ranging anywhere from 8 files to 16 or more and overall import sizes range from 10-15 million to 30+ million. We have been using filesets to handle the imports and allow parallel reads. The problem with filesets is that they are hardcoded with server names.

Due to some upcoming hardware and infrastructure changes, our production group has requested that we no longer use hardcoded filesets. Instead we are to use the Sequential file stage with a file pattern and then set APT_IMPORT_PATTERN_USES_FILESET to true (default value is false).

This option will automatically generate a fileset based on the file pattern we specify. Without this option, DataStage will simply concatenate the files together and read the whole thing - it won't take advantage of parallelism on the read itself. However, if a fileset is generated, then DataStage will do a parallel read of the files (one process per file).

So.... has anyone seen this error before? Any idea what it is complaining about?

Any help would be greatly appreciated! Thanks!

Brad.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I've seen this error in the "101" class, where someone put the name of a Data Set control file (blah.ds rather than blah.fs, where there really was a Data Set controlled by blah.ds) in a File Set stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

The problem is that I am not specifying the fileset that is mentioned in the error message.

In the Sequential stage, I reference the variables #ds_ddir#/#ds_filepat#. I verified the output in Director, and it is getting correctly evaluated to '/u001/data/sdda/dda_src?.dat'.

The fileset referenced in the error message is the one that DataStage dynamically created based on the file pattern. There is nothing in my code that references that directory (/u001/local/etc....), nor do we allow any process to write there outside of DataStage itself. In other words, this is not a situation where a dataset exists there that should be a fileset, unless DataStage itself did it.

The confusion continues....

Brad.
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

Hi Brad,

I have seen this error before while using the APT_IMPORT_PATTERN_USES_FILESET. In my case, this error was thrown out when ever there were no files that matched the file pattern value provided.

The job was a simple -

Seq File -------->Copy-------------->Dataset

But whenever, there came a situation when there were no files matching the pattern (Eg: pp*.dat), the job would abort with a fatal error, almost on the same lines (see below quote) as the one you get -
Sequential_File_22: In file set "/tmp/import_tmp_10388a0ae98c6.fs": Parsing a dataset, expecting a fileset..
Hope this information helps.

Regards

The Bird.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Well, it turns out the problem was how I was setting the file pattern parameter. When we use files here, we have parameters for both the directory (ds_dir) and the filename (ds_file). With the old fileset format (explicit, not dynamic), that worked just fine: #ds_dir#/#ds_fileset#

With this dynamic fileset setup, it looks like I either have to hard code it (not a good idea) or just use one variable for the whole field: #ds_filepattern#.

So it looks like I now have a working solution. I appreciate the help! The only question left is more of a curiosity question of why can't I use 2 variables?

Brad.
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

Hi Brad,
The only question left is more of a curiosity question of why can't I use 2 variables?
I was wondering if this could be an intermittent issue.

I am using to 2 parameters like - #TempFilePath#/#SourceFilePattern# - in my job and it seems to be working fine, except when there are no files that correspond to the file pattern, in which case the job aborts as I had mentioned in my previous mail.


Regards,

The Bird.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

thebird wrote:I am using to 2 parameters like - #TempFilePath#/#SourceFilePattern# - in my job and it seems to be working fine, except when there are no files that correspond to the file pattern, in which case the job aborts as I had mentioned in my previous mail.
Are you also using APT_IMPORT_PATTERN_USES_FILESET = true? I tried it a number of times and could not get it to work. And I double checked that the file pattern was valid and would return 1 or more files.

Maybe it works for file patterns unless you set that APT parameter to true...?

Brad.
Post Reply