padding space to fixed width file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
peddidsx
Premium Member
Premium Member
Posts: 55
Joined: Wed Dec 26, 2007 8:20 am

padding space to fixed width file

Post by peddidsx »

hello All,

I am facing one problem while reading data from sequential file.
Here is what i am doing
I am reading data from fixed with file using sequential file stage. The file is having huge number of records but 2000+ records are getting rejected because the length is not coming as expected that means the value is not coming for some columns.
Now i have to populate the records to target table even though the lenght is shorted by padding spaces.
I tried APT_STRING_PADCHAR with 0x0 and 0x20 but still the records are getting rejected.
Any soultion?? Kindly let me know
Rajesh Peddi
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Could you please provide an example ? i want to know what type of records are rejected ?
Nag
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Is your file also delimited? If so, try setting the "Null field value" under Format->Field defaults to '' (two single quotes). If no delimiters, then how does datastage know which field is missing? Get source to fix the file. If it is fixed width it should be fixed width throughout not when they fancy.
peddidsx
Premium Member
Premium Member
Posts: 55
Joined: Wed Dec 26, 2007 8:20 am

Post by peddidsx »

Kryt0n wrote:Is your file also delimited? If so, try setting the "Null field value" under Format->Field defaults to '' (two single quotes). If no delimiters, then how does datastage know which field is missing? Get source to fix the file. If it is fixed width it should be fixed width throughout not when they fancy.
Hey,
it is not a delimited file, it is fixed with file. the record length should be 2648 but for some records i am getting only 2642 and these records are getting rejected while reading data from file itself. is there any way to read these rejected records like padding the spaces, kindly let me know.
Thanks
Madhu
Rajesh Peddi
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

You need to convert your all records to some fixed lenghth.Try reading the entire record as varchar and put your target as char whose length equal to 2648 .within the job initialise APT_STRING_PADCHAR to space ....so now in the target file you will be having all records of fixed length =2648..from this file do whatever operations you want ...
Nag
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

nagarjuna wrote:You need to convert your all records to some fixed lenghth.Try reading the entire record as varchar and put your target as char whose length equal to 2648 .within the job initialise APT_STRING_PADCHAR to space ....so now in the target file you will be having all records of fixed length =2648..from this file do whatever operations you want ...
Does not answer which field is missing. What if it is the first column missing, you are going to assign values incorrectly. As per my first post, source needs to fix
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

opps..here i thought that the last 6 characters are missing ...

Kryt0n wrote:
nagarjuna wrote:You need to convert your all records to some fixed lenghth.Try reading the entire record as varchar and put your target as char whose length equal to 2648 .within the job initialise APT_STRING_PADCHAR to space ....so now in the target file you will be having all records of fixed length =2648..from this file do whatever operations you want ...
Does not answer which field is missing. What if it is the first column missing, you are going to assign values incorrectly. As per my first post, source needs to fix
Nag
babbu
Participant
Posts: 18
Joined: Mon May 19, 2008 10:57 pm

Post by babbu »

HI,

I faced a similar scenario.

I put all my source columns as char in the seq file and it worked.

records were not rejected
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

On the seq file stage, under format/record-level, try adding the fill char property with a value of space
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
sjfearnside
Premium Member
Premium Member
Posts: 278
Joined: Wed Oct 03, 2007 8:45 am

Post by sjfearnside »

Why not process your correct records and save the rejects to a file and determine what fields are missing? This could be done by having the source file owner review/correct the records in error. If you alter the job to accept bad data what will be the downstream affects of the bad data to the system?
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

sjfearnside wrote:Why not process your correct records and save the rejects to a file and determine what fields are missing? This could be done by having the source file owner review/correct the records in error. If you alter the job to accept bad data what will be the downstream affects of the bad data to the system?

I think source system should correct the file. All records must be of lenght 2648 to read and map to right target column. Padding with 'spaces' will juble the positions and downstream systems if any will receive incrrect data. I have worked with similar issues but worked with surce systems to agree and stick to a file layout.

-thanks
Thank you,
Anu
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

Maybe so... but source systems get things wrong... ETL's purpose is/can be to validate the data for its quality and either provide handling for scenarios where mistakes are made or standards aren't met, or to report out errors. "Getting the source system to sort it" isn't always feasible in the real world.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

miwinter wrote:Maybe so... but source systems get things wrong... ETL's purpose is/can be to validate the data for its quality and either provide handling for scenarios where mistakes are made or standards aren't met, or to report out errors. "Getting the source system to sort it" isn't always feasible in the real world.

Agreed.
If it is not feasible then we can tap all the records which are short f 2648 bytes and handle them seperately or report them.
Thank you,
Anu
dganeshm
Premium Member
Premium Member
Posts: 91
Joined: Tue Aug 11, 2009 3:26 pm

padding space to fixed width file

Post by dganeshm »

awk '{printf "%-<your desired length>s\n",$0}' input_file_name > output_file_name

Example : awk '{printf "%-203s\n",$0}' OA001.M02.TERA2 > OA001.M02.TERA3


Use the above command and make the file fixed width of your desired length then you need not worry about a few records being short. This will add only spaces.
Regards,
Ganesh
Post Reply