corrupt row

khanparwaz · Post by **khanparwaz** » Fri Feb 17, 2006 1:05 am

Hi,
its a strange kind of error i am facing .
the job is running fine in developement environment but giving problem in test env.

Inserting data into seqfile first in overwrite mode then in append mode

overwrite mode write one record in file

append mode write say 1000 redcord in same file

in between 1 & 2 row its putting garbage in 5 columns & rest columns as null so its creating a corrupt row................
dont know from where its coming in table from which these file are getting generated every thing is fine .

can any one help me in removing this extra row.

note : the data we were loading into table was from a ~ seperated flat file & from that table we are getting record into above mentioned seq file.

ray.wurlod · Post by **ray.wurlod** » Fri Feb 17, 2006 1:34 am

You really need to show us your job design. In general it's a bad idea to have two or more writers to the same sequential file. That it works in any environment is just luck - when it goes bad it's probably a timing error. Is your test environment on a faster machine?

Write the rows to separate files (one row in one, 1000 rows in the other) then use cat (UNIX) or type or copy (Windows) to append the one to the other.

khanparwaz · Post by **khanparwaz** » Fri Feb 17, 2006 1:50 am

In the job we are creating a file of rejected records for different condition & using different lookups.

in first mode in which we are creating file (overwrite mode)
we are getting one record coz its only one record matches reject condition.

In second write (append file mode) we are writing another 1000 records in same file & further we are using same file to tackle rejected records.

but if the data going in is not perfect.

one more thing if we view the file in designer it first throws an error

CigmaSTradeToETradeETradeRejectInsertJob..STrade_Reject_CSV.Reject_Read_Lnk: read_delimited() - row 2, column ALT_TRADE_ID, required column missing

ArndW · Post by **ArndW** » Fri Feb 17, 2006 1:57 am

What Ray is stating is that if you have 2 processes write to the same sequential file at the same time you will most likely get corruption.

Are both of these writes occurring in the same job? If yes, you need to look at the design to ensure that the file is first overwritten with one record and then closed - then appended to. Perhaps you have a different configuration between testing and development that can affect this concurrency control (interprocess buffering comes to mind).

It is much better to do as Ray has suggested, create one file with just one line, another with the 1000 and then use a quick, efficient and easy external tool such as "cat" to merge them.

manishsk · Post by **manishsk** » Fri Feb 17, 2006 2:57 am

ray.wurlod wrote:You really need to show us your job design. In general it's a bad idea to have two or more writers to the same sequential file. That it works in any environment is just luck - when it goes bad it's probably a timing error. Is your test environment on a faster machine?

Write the rows to separate files (one row in one, 1000 rows in the other) then use cat (UNIX) or type or copy (Windows) to append the one to the other.

To solve this problem we used Link Collector Stage, provided both structures are same. This is just one more approach you can try, if you don't wish to create 2 seperate files.

Thanks,
Manish

chulett · Post by **chulett** » Fri Feb 17, 2006 7:52 am

ray.wurlod wrote:In general it's a bad idea to have two or more writers to the same sequential file.

In general? I'd use stronger words - it does not work, is a Very Bad Idea and, as you noted, if it seems to be working it's just luck.

The solutions have been touched upon, separate 'sessions' working on the file, separate files concatenated post job or a link collector before writing to the single file. All these approaches assume identical metadata is involved across all output links.