Issue with Seq File

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

manojbh31
Premium Member
Premium Member
Posts: 83
Joined: Thu Jun 21, 2007 6:41 am

Issue with Seq File

Post by manojbh31 »

Hi All,

I have sequential file which has around 400K records which is pipe delimeter with double quotes, when i am running the job through seqeunce the count in datastage design is not showing as expected. For each run count varies. If i run that particluar job, designer is showing the record count as per file.

Eg:-- for first run count 50K
second run more then 100K
third run 40K

Can anybody help what might be the issue.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Tell us more about the job design and about how the job is invoked from the sequence.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
manojbh31
Premium Member
Premium Member
Posts: 83
Joined: Thu Jun 21, 2007 6:41 am

Post by manojbh31 »

Hi Ray,

Job design is simple, seqeunce has 10 jobs and this is second job in seqeunce. No hard code values for any parameter. No issues in the file as well.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Saying that it "is simple" doesn't help us help you. Detail the design for us if you want us to be able to help. And what does "no hard code values for any parameter" mean - are they being passed to it by the Sequence? If so, do one or more of the parameters control the file's name and/or directory? Right now (with no details) your issue makes no sense, I don't see how your job could read a different number of records from the same file unless in fact it is not the same file each run. Are you absolutely certain that you've checked the logs and those three run counts come from the job processing the same file each time?
-craig

"You can never have too many knives" -- Logan Nine Fingers
manojbh31
Premium Member
Premium Member
Posts: 83
Joined: Thu Jun 21, 2007 6:41 am

Post by manojbh31 »

Hi,

Job design as follows

Sequential file---remove dupes----dataset

Have filter to remove header and trailer record in sequential file stage
File has 430k records and I have run this job many times and has not provided correct record count (430k) when I see performance stats in designer, same is the case when I check minor in director

Because of this I am not able to produce the required output.for each run parameter values are passed correctly
manojbh31
Premium Member
Premium Member
Posts: 83
Joined: Thu Jun 21, 2007 6:41 am

Post by manojbh31 »

Can anybody please help with above request
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

How? Not sure how many other questions we can ask. This kind of thing typically takes 'boots on the ground', is there anyone else there with you who can look at it?

Well, maybe...

Are you sorting the data properly to support the Remove Duplicates stage? If running on multiple nodes are you using Hash Partitioning so all of your sort keys go to the correct (i.e. same) partition? If not, I can see that generating a seemingly 'random' number of de-duplicated records. Assuming it always reads all of them and the issue is only on the target end, something I'm still not clear on. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
manojbh31
Premium Member
Premium Member
Posts: 83
Joined: Thu Jun 21, 2007 6:41 am

Post by manojbh31 »

Hi Craig,

You might feel this might be silly question, even i felt before posting the question. I have incorporated all the points which you have mentioned like partitioning, sorting selecting hash in remove duplicate stage etc...
Issue is while reading data from source file and log does not show any rejects in sequential stage.

I created simple job as below
seq File--- Dataset.

Even this is not returning the correct count, for every run the counts changes and there is no issue with the source file. This file is generated by Datastage job by other team and format is same in both the jobs.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

I would bet on somebody from that other team writing your sequential file either during your run or between your runs.

Get on your unix box and note the exact size and time of your source file.

Run job --> check file size and time --> then repeat.

Mike
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Right... I'd wager the same thing. The contents are being changed between or during runs by this mysterious other team. Go with Mike's suggestion - record the file size / count before the job runs. Each time.
-craig

"You can never have too many knives" -- Logan Nine Fingers
manojbh31
Premium Member
Premium Member
Posts: 83
Joined: Thu Jun 21, 2007 6:41 am

Post by manojbh31 »

Hi Mike,

That is not the issue here, I am in testing phase fixing one of the prod issue, and other team has sent file on 0508, For same file for each run counts are not matching. In Prod other team sends file morning 6 AM and my jobs are scheduled at 10.30 PM.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Then it seems to me you are firmly on your own with this. Still, however, clarify this one thing for me that you really haven't yet about these 'wrong numbers' you are getting. Where are the counts going wrong, exactly:

A) You take the same file and read it several times and it reads a different number of records from the file each run?

B) You take the same file and read it several times and it reads the same number of records from the file but writes a different number of records to the target each run?

C) Something else entirely

Pick one.
-craig

"You can never have too many knives" -- Logan Nine Fingers
manojbh31
Premium Member
Premium Member
Posts: 83
Joined: Thu Jun 21, 2007 6:41 am

Post by manojbh31 »

A) You take the same file and read it several times and it reads a different number of records from the file each run?

This is what happening for each run. Take same file, read it several times and it provide different number of records
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No clue. Have you involved support yet?
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

In all honestly, I suspect there is either a critical piece of information we don't have yet or that you have... miscommunicated. Impossible for us to know, however, and I think you are at this point firmly in the hands of someone with access to your system - which is what I meant by 'boots on the ground'. :(
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply