There is a customer_name field in the source .dat file received. This name is in Chinese sometimes. It's unicode length is 35. I have defined this field as varchar(35) with unicode set in a sequential file stage. NLS is set to UTF-8. I am able to read the chinese characters and load it into the DB2 table.
Issue is that whenever the name is in chinese, it consumes more space than expected. Since this file is a fixed width file, everytime chinese names occur all other columns would be shifted to the right, hence loading wrong data to wrong columns.
Job design :
sequential file stage --->tfm---> DB2 API stage
Kindly please provide your suggestions
Chinese name consuming more space than unicode UTF-8 bytes
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Each character may occupy one, two, three or even four bytes. True fixed-width is almost always impossible (at least where the unit of measurement is "bytes"). Is it possible to obtain the data in delimited format?
Last edited by ray.wurlod on Wed Jun 30, 2010 4:29 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I am actually unable to read the premium content. But I am able to process the file if it is a delimited one. In fact tried with BOM and it worked too.ray.wurlod wrote:Each character may occupy one, two, three or even four bytes. True fixed-width is almost always impossible (at least where the unit of measurement is "bytes"). Is it possible to obtain the data in d ...
Just one clarification please, you mentioned unit of measurement is bytes, so is there an option to specify something like unicode characters if not bytes? Is there no other option except to delimit the file?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
No single method - it depends on source. For example, Oracle uses a factor of 3 - you would specify VarChar(105). Try a factor of 2 first - that is, VarChar(70).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks !ray.wurlod wrote:No single method - it depends on source. For example, Oracle uses a factor of 3 - you would specify VarChar(105). Try a factor of 2 first - that is, VarChar(70). ...
I have tried varcha(70),(140) and bigger values too.
Tried nvarchar and nchar too.
Any other options please?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: