Load first 100 records in first execution

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chaithanya
Participant
Posts: 12
Joined: Fri Apr 12, 2013 7:12 am

Load first 100 records in first execution

Post by chaithanya »

I am having 1000 records in a file but i need to load 100 records per execution.After the first execution,need to process second 100 records.etc...Upto 1000
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Seems like an odd requirement. So you need to run the same job over and over, only processing 100 of the rows each time? Sounds like you may need to look into a Sequence job with the Start/End loop stages and job parameters to control the starting and ending record numbers that you leverage in a constraint. Or an MI job where the InvocationID is the starting record. Or split the file up first.

Out of curiousity, why do you need to do this? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
chaithanya
Participant
Posts: 12
Joined: Fri Apr 12, 2013 7:12 am

Post by chaithanya »

Thanks for the reply chullet. Yes,it seems like an odd requirement. I am a datastage trainee in my company.My lead gave this scenario to test my design capability. :wink:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well, then... have at it! :wink:

We anxiously await your results.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chaithanya
Participant
Posts: 12
Joined: Fri Apr 12, 2013 7:12 am

Post by chaithanya »

I tried splitting the files with sequence job. If my input file name is john,and if i split my files to 4,the resultant files are named as "johnaa","johnab" "johnac"," "johnad" .After splitting the files, a parallel job loads each splitted file per execution. Each time i need to give the file suffix 'aa","ab",'ac,"ad" as parameter in parallel job and run.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How do you know that the file has to be split into four? The original requirement, it seems to me, to be files containing 100 records each.

If you're going to go with splitting the file in that way, how will you pick up the appropriate file to run on each iteration? Are you familiar with the StartLoop activity in sequences?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

When I said "split the file" I meant using the UNIX split command rather than a job. Hard to be sure what you did but your mention of the "aa" "ab" (etc) suffixes make it seem you may have used it. Can you clarify?

And we seem to be having the same conversation in more than one post ( the one we're in here plus this one and this one) something generally frowned upon. In the future, please just stay put in a single conversation rather than starting new "sub-topics" as they come up along the way. That would be much appreciated.
-craig

"You can never have too many knives" -- Logan Nine Fingers
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Unix split command puts the suffix aa, ab. You may need to direct split command to use the numeric suffix.

Here after split, you can do a ls -m (I think) to get comma separated list of files in that folder which you can put in a list loop.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
chaithanya
Participant
Posts: 12
Joined: Fri Apr 12, 2013 7:12 am

Post by chaithanya »

I used a sequence job to count the lines in output file. Then added 100 to the count each time. This added value is passed to the HEAD-STAGE in parallel job as parameter.

The sequence job as:

EXEC_COMMAND --> JOB_ACTIVITY

The parallel job as:

SEQ_FILE --> HEAD-STAGE --> TAIL_STAGE --->SEQ_FILE

HEAD-STAGE WILL TAKE first n records.TAIL_STAGE will take last 100 records from HEAD_STAGE.

Initially the output file is empty. Then i load first 100 records.
On second execution, the head stage loads first 200 records ,where as the TAIL_STAGE takes last 100 records(ie 101 to 200th) and loads to SEQ_FILE. :)
Post Reply