Issue with Seq File
Moderators: chulett, rschirm, roy
Issue with Seq File
Hi All,
I have sequential file which has around 400K records which is pipe delimeter with double quotes, when i am running the job through seqeunce the count in datastage design is not showing as expected. For each run count varies. If i run that particluar job, designer is showing the record count as per file.
Eg:-- for first run count 50K
second run more then 100K
third run 40K
Can anybody help what might be the issue.
I have sequential file which has around 400K records which is pipe delimeter with double quotes, when i am running the job through seqeunce the count in datastage design is not showing as expected. For each run count varies. If i run that particluar job, designer is showing the record count as per file.
Eg:-- for first run count 50K
second run more then 100K
third run 40K
Can anybody help what might be the issue.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Saying that it "is simple" doesn't help us help you. Detail the design for us if you want us to be able to help. And what does "no hard code values for any parameter" mean - are they being passed to it by the Sequence? If so, do one or more of the parameters control the file's name and/or directory? Right now (with no details) your issue makes no sense, I don't see how your job could read a different number of records from the same file unless in fact it is not the same file each run. Are you absolutely certain that you've checked the logs and those three run counts come from the job processing the same file each time?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Hi,
Job design as follows
Sequential file---remove dupes----dataset
Have filter to remove header and trailer record in sequential file stage
File has 430k records and I have run this job many times and has not provided correct record count (430k) when I see performance stats in designer, same is the case when I check minor in director
Because of this I am not able to produce the required output.for each run parameter values are passed correctly
Job design as follows
Sequential file---remove dupes----dataset
Have filter to remove header and trailer record in sequential file stage
File has 430k records and I have run this job many times and has not provided correct record count (430k) when I see performance stats in designer, same is the case when I check minor in director
Because of this I am not able to produce the required output.for each run parameter values are passed correctly
How? Not sure how many other questions we can ask. This kind of thing typically takes 'boots on the ground', is there anyone else there with you who can look at it?
Well, maybe...
Are you sorting the data properly to support the Remove Duplicates stage? If running on multiple nodes are you using Hash Partitioning so all of your sort keys go to the correct (i.e. same) partition? If not, I can see that generating a seemingly 'random' number of de-duplicated records. Assuming it always reads all of them and the issue is only on the target end, something I'm still not clear on.![Confused :?](./images/smilies/icon_confused.gif)
Well, maybe...
Are you sorting the data properly to support the Remove Duplicates stage? If running on multiple nodes are you using Hash Partitioning so all of your sort keys go to the correct (i.e. same) partition? If not, I can see that generating a seemingly 'random' number of de-duplicated records. Assuming it always reads all of them and the issue is only on the target end, something I'm still not clear on.
![Confused :?](./images/smilies/icon_confused.gif)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Hi Craig,
You might feel this might be silly question, even i felt before posting the question. I have incorporated all the points which you have mentioned like partitioning, sorting selecting hash in remove duplicate stage etc...
Issue is while reading data from source file and log does not show any rejects in sequential stage.
I created simple job as below
seq File--- Dataset.
Even this is not returning the correct count, for every run the counts changes and there is no issue with the source file. This file is generated by Datastage job by other team and format is same in both the jobs.
You might feel this might be silly question, even i felt before posting the question. I have incorporated all the points which you have mentioned like partitioning, sorting selecting hash in remove duplicate stage etc...
Issue is while reading data from source file and log does not show any rejects in sequential stage.
I created simple job as below
seq File--- Dataset.
Even this is not returning the correct count, for every run the counts changes and there is no issue with the source file. This file is generated by Datastage job by other team and format is same in both the jobs.
Then it seems to me you are firmly on your own with this. Still, however, clarify this one thing for me that you really haven't yet about these 'wrong numbers' you are getting. Where are the counts going wrong, exactly:
A) You take the same file and read it several times and it reads a different number of records from the file each run?
B) You take the same file and read it several times and it reads the same number of records from the file but writes a different number of records to the target each run?
C) Something else entirely
Pick one.
A) You take the same file and read it several times and it reads a different number of records from the file each run?
B) You take the same file and read it several times and it reads the same number of records from the file but writes a different number of records to the target each run?
C) Something else entirely
Pick one.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
In all honestly, I suspect there is either a critical piece of information we don't have yet or that you have... miscommunicated. Impossible for us to know, however, and I think you are at this point firmly in the hands of someone with access to your system - which is what I meant by 'boots on the ground'. ![Sad :(](./images/smilies/icon_sad.gif)
![Sad :(](./images/smilies/icon_sad.gif)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers