Problem with reading multiple files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ScottDun
Participant
Posts: 61
Joined: Thu Dec 10, 2015 9:51 am

Problem with reading multiple files

Post by ScottDun »

Hi,

I created a job where I am using * to input multiple files. I created 6 text files for the input, and each file has 2 providers. For some reason, the output dataset is splitting the output right down the middle. Each file has 44 rows, times 6, is 264. I input a sequence number (1-22) for each provider and a tracking number starting at 5000 (that increases per provider instance). So with 6 files of 2 providers each, the tracking number should be 5000-5011. However, the output is 5000-5005 and then 5000-5005... splitting it right down the middle. Is there a way to make this not happen? Make the output file go straight through without a split?

As a side note, whenever I add another file, totaling 7, the job fails. Can the input seq file only take 6 files at once?
SCOTTDun
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Are you running the job on multiple (as in two in this case) nodes?

You'd need to explain exactly what "fails" means for help with the second question.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ScottDun
Participant
Posts: 61
Joined: Thu Dec 10, 2015 9:51 am

Post by ScottDun »

I changed the job to run on 1 node because I have a smaller amount of data per incoming file. Then I used an environment variable to increase character length for record delimiter to 100000000. This works for now but I need a better way to do it for when there are millions of clients per file.

The job was aborting after more than 6 input files were being loaded so I used the parameter to increase character length before record delimiter.
SCOTTDun
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What APT variable, exactly? And not sure how that would affect loading multiple files, can you post the actual error message(s) from the failed job?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ScottDun
Participant
Posts: 61
Joined: Thu Dec 10, 2015 9:51 am

Post by ScottDun »

$APT_MAX_DELIMITER_READ_SIZE (changed default value to 1000000000)
$APT_CONFIG_FILE (changed to a 1 node)

The error messages are too far back to look at, for we ran the job multiple times after we "fixed" it
SCOTTDun
ScottDun
Participant
Posts: 61
Joined: Thu Dec 10, 2015 9:51 am

Post by ScottDun »

another question i have is how can I run multiple files in the command execute stage in a sequence job... I am using "/disk1/data/Projects/EP1978/DEV/input/copy1*.txt" for the parallel jobs. How can I mimic this in sequence?
SCOTTDun
spuli
Participant
Posts: 40
Joined: Thu Apr 09, 2015 12:13 pm

Post by spuli »

I am trying to understand your problem.
What is your expected output? Are you trying to read all the six files and load it into a single file or appending each record from the six files to form one single record?
Thanks,
Sai
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ScottDun wrote:how can I run multiple files in the command execute stage in a sequence job
I don't understand the question. What do you mean by "run"? You can certainly use the Execute Command stage to return a list of all filenames that match that wildcard pattern, is that what you are after?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ScottDun
Participant
Posts: 61
Joined: Thu Dec 10, 2015 9:51 am

Post by ScottDun »

I have found the answer to my problem. I put the file name in the parameter of the execute command stage
SCOTTDun
Post Reply