Data in the output file has more rows than the input data

Hope · Post by **Hope** » Wed May 16, 2007 1:19 pm

My source teradata Ent prise stage. and out put is sequential file.I have a table whose row count is 6 million .I have to extract this data and write it to a tab-delimited file flat file.
when the write the data on to a flat file I am getting 30 more rows. while my DSjob shows 6 million records. as well as my director shows
6 million records exported successfully.when I view the row count in unix it is giving me 30 more rows. the file in unix in a zip file.
when I use the command gunzip -c filename|wc -l
it shows the target has 30 more rows.
Is there any way I can split the zip file and view the data?.
I dont understand why I am getting more rows.
Can anyone please help me,

Thanks

chulett · Post by **chulett** » Wed May 16, 2007 3:09 pm

Dollars to doughnuts a text field has a LF - a CHAR(10) - in it as 'data'. UNIX will see that as a record separator.

Hope · Post by **Hope** » Thu May 17, 2007 6:33 am

Could you please be more specific.I am new to datastage. My data has Address feilds. ex: StreetNo AptNo .These are tab delimited.When I took at the Teradata report.This feild appears as 1 feild separated by Tab.I guess when it is writing to the flat file it is considering as 2 lines.
Street No
Apt No.
what is LF?.do I need to write a conversion logic in the transformer to convert the new line to space .If so can you please suggest?.Do I need to change the format in sequential file stage ?.If so, to what I should change?.currently I am using Final delimeter as end and Delimiter as tab.I also tried Record delimeter as Unix newline and Record delimiter string as ?n.it didnt work.

chulett · Post by **chulett** » Thu May 17, 2007 6:50 am

LF means Line Feed, also known as a CHAR(10). You'd see all this on any ASCII chart. It is the character that UNIX uses as the 'Record Terminator'.

You would need to determine if that, indeed, is the case. And if so, if they are appropriate to keep as 'data' or should be removed. Keeping them is easy, don't do anything on the write side, it's the read side that will need to smarten up. That usually involves telling DataStage that the field can 'contain terminators'.

To remove them, a function can be used to replace them with 'nothing' in the infected field. Convert is one example, pretty sure that's available in PX jobs:

Code: Select all

Convert(CHAR(10),"",YourField)

Or some other mechanism more appropriate in a PX job.

Hope · Post by **Hope** » Thu May 17, 2007 6:59 am

Thanks for your assistance.Let me try this.

Hope · Post by **Hope** » Thu May 17, 2007 10:22 am

awesome! it worked.

Thank you for your assistance

chulett · Post by **chulett** » Thu May 17, 2007 10:27 am

Great! :D

Please mark the topic as 'Resolved' when you get a chance.

ray.wurlod · Post by **ray.wurlod** » Thu May 17, 2007 4:21 pm

You owe Craig some dollars or doughnuts - it wasn't really clear which!