DSXchange

Posted: **Mon May 05, 2008 4:39 am**

Hi,
I have a simple job that reads wildcard pattern based files from a folder and loads into a target database. I have the File Name Column property set to store the file name in the target database. This works fine in a Windows Datastage Server environment; of course with help from the gurus here viewtopic.php?t=117069&highlight=APT_FileImportOperator

I migrated this to a Linux GRID environment and am running into the following error on the Sequential File stage.
main_program: For createFilesetFromPattern(), could not find any available nodes in node pool "".
SF_Input_File: At least one filename or data source must be set in APT_FileImportOperator before use.
This is happening when i set the $APT_IMPORT_PATTERN_USES_FILESET to TRUE.

If I set the $APT_IMPORT_PATTERN_USES_FILESET to FALSE, the job runs fine, but the file names are not fully expanded but are stored as /pathname/*.txt.

I tried providing a prefix tag to the pattern like "Feed*.txt" and it doesn't make any difference either. It just loads it as /pathname/Feed*.txt.

If I do an "ls" using the the folder name and pattern it lists the 2 files i have copied into the source location for testing.

I did a search on this messages and found that it might help to supply the folder name as a job parameter and then specify the pattern separately. Tried that and that did not help either.

At this time, the only means to get this to work is to set the $APT_IMPORT_PATTERN_USES_FILESET to FALSE

Please let me know if any input from me would help you help me further.
Your time and help is greatly appreciated.

Thanks,

Posted: **Mon May 05, 2008 7:10 am**

Does your configuration file include a default node pool called ""?

Posted: **Mon May 05, 2008 11:39 pm**

Hi Ray,
Thanks for the followup.
I checked the job log and found that the config file being used has the following entries.

{
node "node1"
{
fastname "ctpcqabdsh01p"
pools ""
resource disk "/nfsgrid/nfsbin/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/nfsgrid/nfsbin/IBM/InformationServer/Server/Scratch" {pools ""}
}
node "node2"
{
fastname "ctpcqabdsh01p"
pools ""
resource disk "/nfsgrid/nfsbin/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/nfsgrid/nfsbin/IBM/InformationServer/Server/Scratch" {pools ""}
}
}

Both the nodes have pools ""

Is this what you wanted to be verified Ray?

Let me know if there is any other entry that I should be looking at.

Thanks,

Posted: **Tue May 06, 2008 11:24 am**

That tells me your job is NOT grid enabled. Did you bring in all required grid parameters?

Posted: **Tue May 06, 2008 11:26 pm**

Hi lstsaur,
Thanks for reviewing my query. The following 4 Grid params are the ones that were suggested to be added to all our PX jobs in the GRID. I have created a parameter set by the name APT_GRID_PARAMS in my project for this purpose.
Here are the values for these entries in the log file.

APT_GRID_PARAMS.$APT_GRID_ENABLE = YES (Compiled-in default)
APT_GRID_PARAMS.$APT_GRID_COMPUTENODES = 1 (Compiled-in default)
APT_GRID_PARAMS.$APT_GRID_PARTITIONS = 1 (Compiled-in default)
APT_GRID_PARAMS.$APT_GRID_SEQFILE_HOST = (Compiled-in default)

Another thing I noticed is that the default config file as printed in the director log in the initial Environment variable settings entry (APT_CONFIG_FILE=/nfsgrid/nfsbin/IBM/InformationServer/Server/Configurations/default.apt) points to "default.apt", which is what i had posted earlier.

A few lines below that i see the following log in director for this job.
<Dynamic_gird.sh> SEQFILE Host(s): ctpcqabdsc02p: ctpcqabdsc02p:
{
node "Conductor"
{
fastname "ctpcqabdsh01p"
pools "conductor"
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
node "node1_1"
{
fastname "ctpcqabdsc02p"
pools ""
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
}
I am not fully conversant with grid internals and would appreciate your inputs/directions on how I could decipher this entry.

I have several other PX jobs that work fine with the GRID parameters that i have provided earlier.

Let me know if you need any additional details in this regard.

Thanks for your time,

Posted: **Wed May 07, 2008 11:34 am**

Check your sequential file's Properties-->Source-->File; make sure you populated as
File=$APT_GRID_SEQFILE_HOST/pathname/*.txt

Posted: **Fri May 09, 2008 12:26 am**

Apologies for the delayed response. Got pulled into a few other unnecessary distractions...

I added the GRID variable for Host files and it still wouldn't give desired results. However, when i added that and enabled the following variable
$APT_IMPORT_PATTERN_USES_FILESET = True, i got another error as follows.

SF_Input: Unable to generate a node map from fileset /tmp/import_tmp_20671db190272.fs.
main_program: Could not check all operators because of previous error(s)

On a separate note, we were asked to use the $APT_GRID_SEQFILE_HOST only for output files and not when reading seq files. Is that not the case?

Thanks,

Posted: **Fri May 09, 2008 10:59 am**

No, you can use $APT_GRID_SEQFILE_HOST for input. It retruns the first host name identified by either Grid engine or from IONODE names.

Posted: **Mon May 12, 2008 1:36 am**

Thanks lstsaur.
One additional question:-
If the source file is on the head node, will the use of the variable $APT_GRID_SEQFILE_HOST interfere with the source file location if it did not get the head node as the Host name during the exeuction?

Does the error message I received point to such a symptom?

Thanks,

Posted: **Wed Oct 01, 2008 10:17 pm**

I'm also getting the same problem. If I run my job without setting 'APT_IMPORT_PATTERN_USES_FILESET' to 'True', file names come as 'TestFilePattern????????.dat'.
But if I enable this env variable, the job aborts with following error message:
SQ_SrcFile: Unable to generate a node map from fileset /var/tmp/import_tmp_838635bc0dfb.fs.

Our datastage server is on grid env and I've included all the required grid env variables in the job i.e:
$APT_GRID_ENABLE
$APT_GRID_QUEUE
$APT_GRID_SEQFILE_HOST
$APT_GRID_FROM_PARTITIONS
$APT_GRID_FROM_NODES
$APT_GRID_COMPUTENODES
$APT_GRID_PARTITIONS

Posted: **Tue Oct 07, 2008 10:55 pm**

1. Add another parameter $APT_GRID_HEAD_PARTITIONS = 2 (must be more than 1)
2. Do not use host qualifier for the sequential file . Means remove the $APT_GRID_SEQFILE_HOST preffix from the file path
so it will be File=/pathname/*.txt ; rather than File=$APT_GRID_SEQFILE_HOST/pathname/*.txt

Posted: **Wed Oct 08, 2008 11:28 am**

What version of the Grid Toolkit you are using? There is no such $APT_GRID_HEAD_PATITIONS Toolkit variable. Besides why you want to partition the head node since all PX engines are on the compute nodes.
You don't run any jobs on the head node.

Posted: **Wed Oct 08, 2008 7:13 pm**

because it works.
$APT_GRID_HEAD_PATITIONS=2 will generate two condutor node partition not any compute node on head node. the other partition on head node will be used to generate node map.

Posted: **Wed Oct 08, 2008 9:04 pm**

Well, as I said before in my Grid environment, there is no such $APT_GRID_HEAD_PARTITIONS variable. Dynamic generated configurations file is generated by a Java progrram. I don't understand what you talked about "the other partition on head node will be used to generate node map".

Posted: **Wed Oct 08, 2008 9:42 pm**

This parameter $APT_GRID_HEAD_PATITIONS is available in version 3.3.2 of grid toolkit.