Problem with reading multiple files

ScottDun · Post by **ScottDun** » Tue Dec 29, 2015 12:34 pm

Hi,

I created a job where I am using * to input multiple files. I created 6 text files for the input, and each file has 2 providers. For some reason, the output dataset is splitting the output right down the middle. Each file has 44 rows, times 6, is 264. I input a sequence number (1-22) for each provider and a tracking number starting at 5000 (that increases per provider instance). So with 6 files of 2 providers each, the tracking number should be 5000-5011. However, the output is 5000-5005 and then 5000-5005... splitting it right down the middle. Is there a way to make this not happen? Make the output file go straight through without a split?

As a side note, whenever I add another file, totaling 7, the job fails. Can the input seq file only take 6 files at once?

chulett · Post by **chulett** » Tue Dec 29, 2015 5:14 pm

Are you running the job on multiple (as in two in this case) nodes?

You'd need to explain exactly what "fails" means for help with the second question.

ScottDun · Post by **ScottDun** » Wed Dec 30, 2015 7:10 am

I changed the job to run on 1 node because I have a smaller amount of data per incoming file. Then I used an environment variable to increase character length for record delimiter to 100000000. This works for now but I need a better way to do it for when there are millions of clients per file.

The job was aborting after more than 6 input files were being loaded so I used the parameter to increase character length before record delimiter.

chulett · Post by **chulett** » Wed Dec 30, 2015 8:28 am

What APT variable, exactly? And not sure how that would affect loading multiple files, can you post the actual error message(s) from the failed job?

ScottDun · Post by **ScottDun** » Wed Dec 30, 2015 9:15 am

$APT_MAX_DELIMITER_READ_SIZE (changed default value to 1000000000)
$APT_CONFIG_FILE (changed to a 1 node)

The error messages are too far back to look at, for we ran the job multiple times after we "fixed" it

ScottDun · Post by **ScottDun** » Wed Dec 30, 2015 9:17 am

another question i have is how can I run multiple files in the command execute stage in a sequence job... I am using "/disk1/data/Projects/EP1978/DEV/input/copy1*.txt" for the parallel jobs. How can I mimic this in sequence?

spuli · Post by **spuli** » Wed Dec 30, 2015 10:17 am

I am trying to understand your problem.
What is your expected output? Are you trying to read all the six files and load it into a single file or appending each record from the six files to form one single record?

chulett · Post by **chulett** » Wed Dec 30, 2015 5:04 pm

ScottDun wrote:how can I run multiple files in the command execute stage in a sequence job

I don't understand the question. What do you mean by "run"? You can certainly use the Execute Command stage to return a list of all filenames that match that wildcard pattern, is that what you are after?

ScottDun · Post by **ScottDun** » Mon Jan 04, 2016 2:12 pm

I have found the answer to my problem. I put the file name in the parameter of the execute command stage