sequential file sizes created in server vs PX

bmadhav · Post by **bmadhav** » Sat Oct 29, 2005 10:05 am

I have observed that a sequential file created with the same properties and same formats in PX and server have different output sizes.
The size of the PX file is approx. double the size of the server seq. file.
In my case, the server file size is approx. 40 Gig and the PX file size is approx. 80 Gig.
Both the files are delimited files. On the PX side, i did trims, compact white spaces to reduce the size, but i am still coming up with a larger PX seq. file in comparision to a server seq. file!

Anybody run into this before?

Thanks
Bindu

ArndW · Post by **ArndW** » Sat Oct 29, 2005 10:14 am

No, this is a new observation.

What does a diff show is the per line difference?

chulett · Post by **chulett** » Sat Oct 29, 2005 1:55 pm

Or the good 'ol Mark One Eyeball?

That's a substantial difference and should be readily apparent.

ray.wurlod · Post by **ray.wurlod** » Sat Oct 29, 2005 4:17 pm

Do you get the same results with small data volumes?

Can you see any difference between the structure of the first line or two of each file? (head -2 filename)

Are you specifying Extended (Unicode) character strings in PX?

bmadhav · Post by **bmadhav** » Sat Oct 29, 2005 9:27 pm

I did go back and run this job with a smaller sample of data (just 1 record) and i still get the size difference. after compressing white spaces the PX file is still bigger, but not twice as big when i started.
Here are some of the differences i observed betn server and PX files:
Char fields:
Compression of white space on the PX file for strings still leaves each of the character fields with 1 trailing space

Decimal fields:
Decimal fields are zero filled and not compressed at all, by it's very nature much of our data has many numeric fields with zero values so a decimal 13,2 column in server is written as '0.00' while PX will write '00000000000.00'

Ray,
How do i know if i am specifying Extended (Unicode) character strings in PX?

Thanks
Bindu

ray.wurlod · Post by **ray.wurlod** » Sun Oct 30, 2005 12:05 am

There's an "Extended" column in the grid on the Columns tab.

Server jobs generate the smallest possible numeric fields, stripping leading zeroes to the left of the decimal place and trailing zeroes to the right of the decimal place. It seems you may have nailed it. The next question to answer is whether there's any property (or stage type) that will allow you to alter the "leave all the zeroes" behaviour that you're seeing.

I can't research that right now (I don't have access to DS on weekends). Perhaps you'd like to investigate further and report your findings?

bmadhav · Post by **bmadhav** » Sun Oct 30, 2005 8:29 am

Same here Ray, i do not have access to DS from home. I will research it tomorrow and get back to the group.

Thnx
Bindu