Hi folks,
I am been playing with file stage from couple of days.
I have a fixed length file which may have extra length of chars.
In the columns of the stage i have mention all the fields as char with their respective sizes since all of my records are of fixed width.
And i have a reject link.
Lets take the total size of the record is 12 char
fields one f1 of 4
fields one f2 of 4
fields one f3 of 4
--------------------
Total 12
Lets take some samples
1234ABCD1234(size 12) Fixed width
ABCD1234ABCD(size 12) Fixed Width
1234567890ab(size 12) Fixed Width
1234abcd1234a( szie 13) One extra char
abcd1234abc(size 11) One char less
Its a unix based file
I have to do some validation and transformation (ETL process)
The first the foremost step is to have a stage and define the schema in which i will be reading it.
I wanted read only those records whose size is 12 and rest all to be rejected(I mean pushed to the reject link).
My question is how shoud my schema be ?
Under Property Tab
-----------------------
I specify the file name and reject mode to be output
Under Format Tab
---------------------
I am stuck here
we have records level and field defualt. What i am doing is
Under record level --
Record delimitor as "unix new line"(since its a unix base file)
Record Length as "fixed"(We have "end "and "none" options too)
Under Field Delimiter--
Delimitor = "None"
Under Columns Tab
-----------------------
f1 char (4)
f2 char(4)
f3 char(4)
I wanted to run this job without any warnings, what more modification i have to do and i wanted to reject all records whose size is less or more than 12 char.
When i click on View Data. Any Records whose size is more or less than 12 are rejected.
When i run the job i get warnings at those location were the data is more or less
Warning like
-did not consume entire input
-Input buffer overrun
I don't want this warrings.
What changes i have to make or do in my step to ge the rejected records as well as no warning when i execute the job.
Thanks in advance
File Stage
Moderators: chulett, rschirm, roy
Re: File Stage
You are asking how to handle dynamic lengths for a flat file input (i.e. bad fields).vzmz wrote:I wanted read only those records whose size is 12 and rest all to be rejected(I mean pushed to the reject link).
My question is how shoud my schema be ?
1. Use QualityStage.
OR
1. Set everything to varchar of an undefined length.
2. Establish a transform stage after the input. Within this transform stage, put a constraint (using len() function). Send the rejected records to whatever you want to do with it. Experiment a bit with this concept, so you can fully understand how it works.
-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Re: File Stage
But that way i won't be able to make use of the reject link. I am doing something else in the transformations.Teej wrote:You are asking how to handle dynamic lengths for a flat file input (i.e. bad fields).vzmz wrote:I wanted read only those records whose size is 12 and rest all to be rejected(I mean pushed to the reject link).
My question is how shoud my schema be ?
1. Use QualityStage.
OR
1. Set everything to varchar of an undefined length.
2. Establish a transform stage after the input. Within this transform stage, put a constraint (using len() function). Send the rejected records to whatever you want to do with it. Experiment a bit with this concept, so you can fully understand how it works.
-T.J.
If i add this into transofrmation its getting more complex
If i make it a varchar then no rows will be rejected, so i end up doing that in a transofrmation stage.
Apart from this how can i make use of reject link in a file stage withour warnings
Re: File Stage
Instead of a transformer you can use modify and switch:vzmz wrote:But that way i won't be able to make use of the reject link. I am doing something else in the transformations.Teej wrote:You are asking how to handle dynamic lengths for a flat file input (i.e. bad fields).vzmz wrote:I wanted read only those records whose size is 12 and rest all to be rejected(I mean pushed to the reject link).
My question is how shoud my schema be ?
1. Use QualityStage.
OR
1. Set everything to varchar of an undefined length.
2. Establish a transform stage after the input. Within this transform stage, put a constraint (using len() function). Send the rejected records to whatever you want to do with it. Experiment a bit with this concept, so you can fully understand how it works.
-T.J.
If i add this into transofrmation its getting more complex
If i make it a varchar then no rows will be rejected, so i end up doing that in a transofrmation stage.
Apart from this how can i make use of reject link in a file stage withour warnings
osh "
import
-schema record
(
s:string;
)
-file 'var_length.dat'
-rejects continue
-reportProgress yes
|
modify
'l = string_length(s); s=s;'
|
switch
-case 12=0
-ifnotfound allow
-key l
0> good_record.v
1> bad_record.v;
peek
-name
<good_record.v;
peek
-name
<bad_record.v;
"
Vitali.
Re: File Stage
This is the type of philosophy that you need to end. This is the ETL world, where things are far easier to implement and try. Why are you so concerned with using a 'reject' link, when you can practically handle the 'rejected' records on your own with only 10 minutes work?vzmz wrote:But that way i won't be able to make use of the reject link. I am doing something else in the transformations.
If i add this into transofrmation its getting more complex
And why do you have to do everything within one transform stage, when you can use two?
I have struggled with this mindset at my job that it is driving me to the point where I am ready to quit. It is bad enough to even have to explain why using parameters are _GOOD_, not _BAD_, as one person seems to have a fixtation on.
Instead of painting yourself in a corner by requiring that you MUST use a reject link, try questioning on what your requirements really are. I bet it does not even care that there's a broken line link out your Oracle stage, as long as something gets reported somewhere.
-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Re: File Stage
Thanks TeejTeej wrote:This is the type of philosophy that you need to end. This is the ETL world, where things are far easier to implement and try. Why are you so concerned with using a 'reject' link, when you can practically handle the 'rejected' records on your own with only 10 minutes work?vzmz wrote:But that way i won't be able to make use of the reject link. I am doing something else in the transformations.
If i add this into transofrmation its getting more complex
And why do you have to do everything within one transform stage, when you can use two?
I have struggled with this mindset at my job that it is driving me to the point where I am ready to quit. It is bad enough to even have to explain why using parameters are _GOOD_, not _BAD_, as one person seems to have a fixtation on.
Instead of painting yourself in a corner by requiring that you MUST use a reject link, try questioning on what your requirements really are. I bet it does not even care that there's a broken line link out your Oracle stage, as long as something gets reported somewhere.
-T.J.
As u told me i have done the validation using the transformation stage instead of reject link. Basically i have to make 2 jobs one for the length validation and the other for nulls and dups and then i sequenced it with a job sequencer.
Re: File Stage
Hey, that's a pretty good idea. I would have done it in one job, but if one fails, the other would not run the necessary task.vzmz wrote:As u told me i have done the validation using the transformation stage instead of reject link. Basically i have to make 2 jobs one for the length validation and the other for nulls and dups and then i sequenced it with a job sequencer.
My problem is breaking a job up in manageable pieces. Hehe. You should see some of the jobs I've made especially after quite a few extra requirements came down the pipeline after the job was designed.
-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Thanks Teej for the excellent lines that you wrote!!!
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
This is the type of philosophy that you need to end. This is the ETL world, where things are far easier to implement and try. Why are you so concerned with using a 'reject' link, when you can practically handle the 'rejected' records on your own with only 10 minutes work?
And why do you have to do everything within one transform stage, when you can use two?
I have struggled with this mindset at my job that it is driving me to the point where I am ready to quit. It is bad enough to even have to explain why using parameters are _GOOD_, not _BAD_, as one person seems to have a fixtation on.
Instead of painting yourself in a corner by requiring that you MUST use a reject link, try questioning on what your requirements really are. I bet it does not even care that there's a broken line link out your Oracle stage, as long as something gets reported somewhere.
-T.J.
""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Many programmers have gone to those trap and continue to go.
I'm not sure why, may be because human nature is to try to
do perfect things that everybody have in theirs minds, even if it
doesn't match the reality (the BRD), but it satisfies their vision
of the problem and they are ready to protect it by any means...
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
This is the type of philosophy that you need to end. This is the ETL world, where things are far easier to implement and try. Why are you so concerned with using a 'reject' link, when you can practically handle the 'rejected' records on your own with only 10 minutes work?
And why do you have to do everything within one transform stage, when you can use two?
I have struggled with this mindset at my job that it is driving me to the point where I am ready to quit. It is bad enough to even have to explain why using parameters are _GOOD_, not _BAD_, as one person seems to have a fixtation on.
Instead of painting yourself in a corner by requiring that you MUST use a reject link, try questioning on what your requirements really are. I bet it does not even care that there's a broken line link out your Oracle stage, as long as something gets reported somewhere.
-T.J.
""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Many programmers have gone to those trap and continue to go.
I'm not sure why, may be because human nature is to try to
do perfect things that everybody have in theirs minds, even if it
doesn't match the reality (the BRD), but it satisfies their vision
of the problem and they are ready to protect it by any means...
Edited by Mike3000