Page 1 of 1

Data in the output file has more rows than the input data

Posted: Wed May 16, 2007 1:19 pm
by Hope
My source teradata Ent prise stage. and out put is sequential file.I have a table whose row count is 6 million .I have to extract this data and write it to a tab-delimited file flat file.
when the write the data on to a flat file I am getting 30 more rows. while my DSjob shows 6 million records. as well as my director shows
6 million records exported successfully.when I view the row count in unix it is giving me 30 more rows. the file in unix in a zip file.
when I use the command gunzip -c filename|wc -l
it shows the target has 30 more rows.
Is there any way I can split the zip file and view the data?.
I dont understand why I am getting more rows.
Can anyone please help me,

Thanks

Posted: Wed May 16, 2007 3:09 pm
by chulett
Dollars to doughnuts a text field has a LF - a CHAR(10) - in it as 'data'. UNIX will see that as a record separator.

Posted: Thu May 17, 2007 6:33 am
by Hope
Could you please be more specific.I am new to datastage. My data has Address feilds. ex: StreetNo AptNo .These are tab delimited.When I took at the Teradata report.This feild appears as 1 feild separated by Tab.I guess when it is writing to the flat file it is considering as 2 lines.
Street No
Apt No.
what is LF?.do I need to write a conversion logic in the transformer to convert the new line to space .If so can you please suggest?.Do I need to change the format in sequential file stage ?.If so, to what I should change?.currently I am using Final delimeter as end and Delimiter as tab.I also tried Record delimeter as Unix newline and Record delimiter string as ?n.it didnt work.

Posted: Thu May 17, 2007 6:50 am
by chulett
LF means Line Feed, also known as a CHAR(10). You'd see all this on any ASCII chart. It is the character that UNIX uses as the 'Record Terminator'.

You would need to determine if that, indeed, is the case. And if so, if they are appropriate to keep as 'data' or should be removed. Keeping them is easy, don't do anything on the write side, it's the read side that will need to smarten up. That usually involves telling DataStage that the field can 'contain terminators'.

To remove them, a function can be used to replace them with 'nothing' in the infected field. Convert is one example, pretty sure that's available in PX jobs:

Code: Select all

Convert(CHAR(10),"",YourField)
Or some other mechanism more appropriate in a PX job.

Posted: Thu May 17, 2007 6:59 am
by Hope
Thanks for your assistance.Let me try this.

Posted: Thu May 17, 2007 10:22 am
by Hope
awesome! it worked.

Thank you for your assistance

Posted: Thu May 17, 2007 10:27 am
by chulett
Great! :D

Please mark the topic as 'Resolved' when you get a chance.

Posted: Thu May 17, 2007 4:21 pm
by ray.wurlod
You owe Craig some dollars or doughnuts - it wasn't really clear which!
:lol: