Data in the output file has more rows than the input data

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
Hope
Participant
Posts: 97
Joined: Sun May 13, 2007 2:51 pm
Contact:

Data in the output file has more rows than the input data

Post by Hope »

My source teradata Ent prise stage. and out put is sequential file.I have a table whose row count is 6 million .I have to extract this data and write it to a tab-delimited file flat file.
when the write the data on to a flat file I am getting 30 more rows. while my DSjob shows 6 million records. as well as my director shows
6 million records exported successfully.when I view the row count in unix it is giving me 30 more rows. the file in unix in a zip file.
when I use the command gunzip -c filename|wc -l
it shows the target has 30 more rows.
Is there any way I can split the zip file and view the data?.
I dont understand why I am getting more rows.
Can anyone please help me,

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Dollars to doughnuts a text field has a LF - a CHAR(10) - in it as 'data'. UNIX will see that as a record separator.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Hope
Participant
Posts: 97
Joined: Sun May 13, 2007 2:51 pm
Contact:

Post by Hope »

Could you please be more specific.I am new to datastage. My data has Address feilds. ex: StreetNo AptNo .These are tab delimited.When I took at the Teradata report.This feild appears as 1 feild separated by Tab.I guess when it is writing to the flat file it is considering as 2 lines.
Street No
Apt No.
what is LF?.do I need to write a conversion logic in the transformer to convert the new line to space .If so can you please suggest?.Do I need to change the format in sequential file stage ?.If so, to what I should change?.currently I am using Final delimeter as end and Delimiter as tab.I also tried Record delimeter as Unix newline and Record delimiter string as ?n.it didnt work.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

LF means Line Feed, also known as a CHAR(10). You'd see all this on any ASCII chart. It is the character that UNIX uses as the 'Record Terminator'.

You would need to determine if that, indeed, is the case. And if so, if they are appropriate to keep as 'data' or should be removed. Keeping them is easy, don't do anything on the write side, it's the read side that will need to smarten up. That usually involves telling DataStage that the field can 'contain terminators'.

To remove them, a function can be used to replace them with 'nothing' in the infected field. Convert is one example, pretty sure that's available in PX jobs:

Code: Select all

Convert(CHAR(10),"",YourField)
Or some other mechanism more appropriate in a PX job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Hope
Participant
Posts: 97
Joined: Sun May 13, 2007 2:51 pm
Contact:

Post by Hope »

Thanks for your assistance.Let me try this.
Hope
Participant
Posts: 97
Joined: Sun May 13, 2007 2:51 pm
Contact:

Post by Hope »

awesome! it worked.

Thank you for your assistance
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Great! :D

Please mark the topic as 'Resolved' when you get a chance.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You owe Craig some dollars or doughnuts - it wasn't really clear which!
:lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply