Load first 100 records in first execution

chaithanya · Post by **chaithanya** » Fri May 24, 2013 8:06 am

I am having 1000 records in a file but i need to load 100 records per execution.After the first execution,need to process second 100 records.etc...Upto 1000

chulett · Post by **chulett** » Fri May 24, 2013 9:17 am

Seems like an odd requirement. So you need to run the same job over and over, only processing 100 of the rows each time? Sounds like you may need to look into a Sequence job with the Start/End loop stages and job parameters to control the starting and ending record numbers that you leverage in a constraint. Or an MI job where the InvocationID is the starting record. Or split the file up first.

Out of curiousity, why do you need to do this?

chaithanya · Post by **chaithanya** » Fri May 24, 2013 8:39 pm

Thanks for the reply chullet. Yes,it seems like an odd requirement. I am a datastage trainee in my company.My lead gave this scenario to test my design capability.

chulett · Post by **chulett** » Fri May 24, 2013 9:30 pm

Well, then... have at it!

We anxiously await your results.

chaithanya · Post by **chaithanya** » Sat May 25, 2013 10:31 pm

I tried splitting the files with sequence job. If my input file name is john,and if i split my files to 4,the resultant files are named as "johnaa","johnab" "johnac"," "johnad" .After splitting the files, a parallel job loads each splitted file per execution. Each time i need to give the file suffix 'aa","ab",'ac,"ad" as parameter in parallel job and run.

ray.wurlod · Post by **ray.wurlod** » Sun May 26, 2013 12:11 am

How do you know that the file has to be split into four? The original requirement, it seems to me, to be files containing 100 records each.

If you're going to go with splitting the file in that way, how will you pick up the appropriate file to run on each iteration? Are you familiar with the StartLoop activity in sequences?

chulett · Post by **chulett** » Sun May 26, 2013 7:25 am

When I said "split the file" I meant using the UNIX split command rather than a job. Hard to be sure what you did but your mention of the "aa" "ab" (etc) suffixes make it seem you may have used it. Can you clarify?

And we seem to be having the same conversation in more than one post ( the one we're in here plus this one and this one) something generally frowned upon. In the future, please just stay put in a single conversation rather than starting new "sub-topics" as they come up along the way. That would be much appreciated.

priyadarshikunal · Post by **priyadarshikunal** » Sun May 26, 2013 8:20 pm

Unix split command puts the suffix aa, ab. You may need to direct split command to use the numeric suffix.

Here after split, you can do a ls -m (I think) to get comma separated list of files in that folder which you can put in a list loop.

chaithanya · Post by **chaithanya** » Sat Jun 01, 2013 12:50 am

I used a sequence job to count the lines in output file. Then added 100 to the count each time. This added value is passed to the HEAD-STAGE in parallel job as parameter.

The sequence job as:

EXEC_COMMAND --> JOB_ACTIVITY

The parallel job as:

SEQ_FILE --> HEAD-STAGE --> TAIL_STAGE --->SEQ_FILE

HEAD-STAGE WILL TAKE first n records.TAIL_STAGE will take last 100 records from HEAD_STAGE.

Initially the output file is empty. Then i load first 100 records.
On second execution, the head stage loads first 200 records ,where as the TAIL_STAGE takes last 100 records(ie 101 to 200th) and loads to SEQ_FILE.