Regarding fixed length file
Moderators: chulett, rschirm, roy
Regarding fixed length file
Hi All,
We have requirements to load fixed length file format in our ETL environment and have found an issue which I don't know it is a file format issue or Data Stage issue or issue while FTPing the data into our landing area. Would require your inputs resolving it.
We have file with metadata like below
COL1 1 - 2
COL2 3 - 4
COL3 5- 6
Consider the records in the fixed length flat file are like below
AA CC
XXYY
DDFF
SSRRGG
When I view data after defining the metadata in sequential file stage some records get missed out like only following records are displayed or loaded
AA CC
SSRRGG
Other two records which doesnot have a value for the last column is missed out. When looked into the file for record 2 and 3 there are no spaces after the last character that instead cursor moves on to the next line. Is this a problem reading the record in DataStage? Is there an option to set NULL for the last column if the value is not available?
Have anyone come across such situation and let me know how we could resolve this issue?
Thanks in advance for your input.
Arvind
We have requirements to load fixed length file format in our ETL environment and have found an issue which I don't know it is a file format issue or Data Stage issue or issue while FTPing the data into our landing area. Would require your inputs resolving it.
We have file with metadata like below
COL1 1 - 2
COL2 3 - 4
COL3 5- 6
Consider the records in the fixed length flat file are like below
AA CC
XXYY
DDFF
SSRRGG
When I view data after defining the metadata in sequential file stage some records get missed out like only following records are displayed or loaded
AA CC
SSRRGG
Other two records which doesnot have a value for the last column is missed out. When looked into the file for record 2 and 3 there are no spaces after the last character that instead cursor moves on to the next line. Is this a problem reading the record in DataStage? Is there an option to set NULL for the last column if the value is not available?
Have anyone come across such situation and let me know how we could resolve this issue?
Thanks in advance for your input.
Arvind
Thanks & Regards
arvind sampath
Software Engineer
arvind sampath
Software Engineer
-
- Premium Member
- Posts: 59
- Joined: Fri Apr 22, 2011 8:02 am
Hi Arvindh,
It is a requirement in a fixed width file that every line must have the same number of characters and if it is a delimited file there should be correct numebr of delimiters or else the file parsing will fail.
We also faced this issue earlier and the records which had lesser length were dropped when the parsing happened.
We had to get the corrected file from the source system. If the last few columns do not have any data then we need to get it with spaces instead of characters.
Thanks,
Madhumitha
It is a requirement in a fixed width file that every line must have the same number of characters and if it is a delimited file there should be correct numebr of delimiters or else the file parsing will fail.
We also faced this issue earlier and the records which had lesser length were dropped when the parsing happened.
We had to get the corrected file from the source system. If the last few columns do not have any data then we need to get it with spaces instead of characters.
Thanks,
Madhumitha
Last edited by Madhumitha_Raghunathan on Tue Sep 18, 2012 9:41 am, edited 1 time in total.
I've had the same problem before in that some mainframe FTP removes trailing spaces.
In my case I used a filter in the source fixed width file stage, i.e.
(remove 0x1A characters and then pads lines to 1494 length)
This might help, since with binary ftp the source data will most likely arrive in EBCDIC and you would need to know the correct character set in order to map in DataStage.
In my case I used a filter in the source fixed width file stage, i.e.
Code: Select all
sed 's/[ \t]*$//' | tr -d '\x1A' | awk '{printf "%-1494s\n",$0}' -
This might help, since with binary ftp the source data will most likely arrive in EBCDIC and you would need to know the correct character set in order to map in DataStage.
No, this is in the filter part of source sequential file stage:
Code: Select all
sed 's/[ \t]*$//' | tr -d '\x1A' | awk '{printf "%-1494s\n",$0}' -
Thanks Andrw for your swift reply. Let me try and I guess I need to add this in the filter options and with all the last columns in the columns tabs.
I have also tried one more option, I don't know if it is the best way to do it and are there any limitations in it?
Read entire string as one record in source and split it in transformer with sub string like rec[1,2],rec[2,3], so for the last column if no value then will be populated as NULL in the target. However, I do get warnings in Data Stage log as "Exporting nullable field without null handling properties" . Does anyone know if is this a problem or we can add this to a message handler?
I have also tried one more option, I don't know if it is the best way to do it and are there any limitations in it?
Read entire string as one record in source and split it in transformer with sub string like rec[1,2],rec[2,3], so for the last column if no value then will be populated as NULL in the target. However, I do get warnings in Data Stage log as "Exporting nullable field without null handling properties" . Does anyone know if is this a problem or we can add this to a message handler?
Thanks & Regards
arvind sampath
Software Engineer
arvind sampath
Software Engineer