sequential file sizes created in server vs PX

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bmadhav
Charter Member
Charter Member
Posts: 50
Joined: Wed May 12, 2004 1:16 pm

sequential file sizes created in server vs PX

Post by bmadhav »

I have observed that a sequential file created with the same properties and same formats in PX and server have different output sizes.
The size of the PX file is approx. double the size of the server seq. file.
In my case, the server file size is approx. 40 Gig and the PX file size is approx. 80 Gig.
Both the files are delimited files. On the PX side, i did trims, compact white spaces to reduce the size, but i am still coming up with a larger PX seq. file in comparision to a server seq. file! :?:
Anybody run into this before? :(

Thanks
Bindu
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No, this is a new observation.

What does a diff show is the per line difference?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Or the good 'ol Mark One Eyeball? :wink:

That's a substantial difference and should be readily apparent.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do you get the same results with small data volumes?

Can you see any difference between the structure of the first line or two of each file? (head -2 filename)

Are you specifying Extended (Unicode) character strings in PX?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bmadhav
Charter Member
Charter Member
Posts: 50
Joined: Wed May 12, 2004 1:16 pm

Post by bmadhav »

I did go back and run this job with a smaller sample of data (just 1 record) and i still get the size difference. after compressing white spaces the PX file is still bigger, but not twice as big when i started.
Here are some of the differences i observed betn server and PX files:
Char fields:
Compression of white space on the PX file for strings still leaves each of the character fields with 1 trailing space

Decimal fields:
Decimal fields are zero filled and not compressed at all, by it's very nature much of our data has many numeric fields with zero values so a decimal 13,2 column in server is written as '0.00' while PX will write '00000000000.00'

Ray,
How do i know if i am specifying Extended (Unicode) character strings in PX?

Thanks
Bindu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's an "Extended" column in the grid on the Columns tab.

Server jobs generate the smallest possible numeric fields, stripping leading zeroes to the left of the decimal place and trailing zeroes to the right of the decimal place. It seems you may have nailed it. The next question to answer is whether there's any property (or stage type) that will allow you to alter the "leave all the zeroes" behaviour that you're seeing.

I can't research that right now (I don't have access to DS on weekends). Perhaps you'd like to investigate further and report your findings?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bmadhav
Charter Member
Charter Member
Posts: 50
Joined: Wed May 12, 2004 1:16 pm

Post by bmadhav »

Same here Ray, i do not have access to DS from home. I will research it tomorrow and get back to the group.

Thnx
Bindu
Post Reply