Issue with Seq File

manojbh31 · Post by **manojbh31** » Mon May 11, 2015 1:36 pm

Hi All,

I have sequential file which has around 400K records which is pipe delimeter with double quotes, when i am running the job through seqeunce the count in datastage design is not showing as expected. For each run count varies. If i run that particluar job, designer is showing the record count as per file.

Eg:-- for first run count 50K
second run more then 100K
third run 40K

Can anybody help what might be the issue.

ray.wurlod · Post by **ray.wurlod** » Mon May 11, 2015 2:05 pm

Tell us more about the job design and about how the job is invoked from the sequence.

manojbh31 · Post by **manojbh31** » Tue May 12, 2015 6:00 am

Hi Ray,

Job design is simple, seqeunce has 10 jobs and this is second job in seqeunce. No hard code values for any parameter. No issues in the file as well.

chulett · Post by **chulett** » Tue May 12, 2015 6:42 am

Saying that it "is simple" doesn't help us help you. Detail the design for us if you want us to be able to help. And what does "no hard code values for any parameter" mean - are they being passed to it by the Sequence? If so, do one or more of the parameters control the file's name and/or directory? Right now (with no details) your issue makes no sense, I don't see how your job could read a different number of records from the same file unless in fact it is not the same file each run. Are you absolutely certain that you've checked the logs and those three run counts come from the job processing the same file each time?

manojbh31 · Post by **manojbh31** » Tue May 12, 2015 11:29 am

Hi,

Job design as follows

Sequential file---remove dupes----dataset

Have filter to remove header and trailer record in sequential file stage
File has 430k records and I have run this job many times and has not provided correct record count (430k) when I see performance stats in designer, same is the case when I check minor in director

Because of this I am not able to produce the required output.for each run parameter values are passed correctly

manojbh31 · Post by **manojbh31** » Wed May 13, 2015 2:08 pm

Can anybody please help with above request

chulett · Post by **chulett** » Wed May 13, 2015 2:28 pm

How? Not sure how many other questions we can ask. This kind of thing typically takes 'boots on the ground', is there anyone else there with you who can look at it?

Well, maybe...

Are you sorting the data properly to support the Remove Duplicates stage? If running on multiple nodes are you using Hash Partitioning so all of your sort keys go to the correct (i.e. same) partition? If not, I can see that generating a seemingly 'random' number of de-duplicated records. Assuming it always reads all of them and the issue is only on the target end, something I'm still not clear on.

manojbh31 · Post by **manojbh31** » Wed May 13, 2015 6:34 pm

Hi Craig,

You might feel this might be silly question, even i felt before posting the question. I have incorporated all the points which you have mentioned like partitioning, sorting selecting hash in remove duplicate stage etc...
Issue is while reading data from source file and log does not show any rejects in sequential stage.

I created simple job as below
seq File--- Dataset.

Even this is not returning the correct count, for every run the counts changes and there is no issue with the source file. This file is generated by Datastage job by other team and format is same in both the jobs.

Mike · Post by **Mike** » Wed May 13, 2015 7:00 pm

I would bet on somebody from that other team writing your sequential file either during your run or between your runs.

Get on your unix box and note the exact size and time of your source file.

Run job --> check file size and time --> then repeat.

Mike

chulett · Post by **chulett** » Wed May 13, 2015 8:10 pm

Right... I'd wager the same thing. The contents are being changed between or during runs by this mysterious other team. Go with Mike's suggestion - record the file size / count before the job runs. Each time.

manojbh31 · Post by **manojbh31** » Thu May 14, 2015 6:19 am

Hi Mike,

That is not the issue here, I am in testing phase fixing one of the prod issue, and other team has sent file on 0508, For same file for each run counts are not matching. In Prod other team sends file morning 6 AM and my jobs are scheduled at 10.30 PM.

chulett · Post by **chulett** » Thu May 14, 2015 7:05 am

Then it seems to me you are firmly on your own with this. Still, however, clarify this one thing for me that you really haven't yet about these 'wrong numbers' you are getting. Where are the counts going wrong, exactly:

A) You take the same file and read it several times and it reads a different number of records from the file each run?

B) You take the same file and read it several times and it reads the same number of records from the file but writes a different number of records to the target each run?

C) Something else entirely

Pick one.

manojbh31 · Post by **manojbh31** » Thu May 14, 2015 7:25 am

A) You take the same file and read it several times and it reads a different number of records from the file each run?

This is what happening for each run. Take same file, read it several times and it provide different number of records

chulett · Post by **chulett** » Thu May 14, 2015 7:35 am

No clue. Have you involved support yet?

chulett · Post by **chulett** » Thu May 14, 2015 9:09 am

In all honestly, I suspect there is either a critical piece of information we don't have yet or that you have... miscommunicated. Impossible for us to know, however, and I think you are at this point firmly in the hands of someone with access to your system - which is what I meant by 'boots on the ground'.