Dataset : size of data file change after orchadmin cp

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
XRAY
Participant
Posts: 33
Joined: Mon Apr 03, 2006 12:09 am

Dataset : size of data file change after orchadmin cp

Post by XRAY »

Hi All,

I have an issue with orchadmin cp.

Some datasets are migrated from an old sever and copies were made with orchadmin cp.


eg.
export APT_CONFIG_FILE=4n.apt
orchadmin cp A.ds B.ds

where 4n.apt is used to generate A.ds

The record count for both files are same , however it was found that the size of data file in B.ds and A.ds are different. Roughly, the size of A.ds is double of B.ds
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Was A.ds populated in one go, or by successive appends?

Looks like orchadmin cp has taken the chance to effect reduction of un-needed space.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why not look at the segment files (using the Data Set Management tool in Designer) to try to work out where the differences actually lie? In particular, review how many 32K or 128K blocks are involved for each Data Set.

Space in a Data Set includes any unused space in these storage blocks. It may be the case that the orchadmin cp command has been able to pack blocks more efficiently, particularly if A.ds has been appended to on occasions.

Appending to a Data Set does not re-use any internal storage blocks - it simply adds more blocks to the end of the segment file structure.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What versions were the two servers? The copy might have changed the string column compression attribute in the Datasets.
XRAY
Participant
Posts: 33
Joined: Mon Apr 03, 2006 12:09 am

Post by XRAY »

The original serer is 7.5.1
The new server is 9.1.2

The record count ( via dsrecords ) and the content of two dataset are just the same !
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That's a good thing. In fact, all of this is a good thing... are you thinking there's a problem of some kind? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I wanted to check up at work, but forgot to do so. I do believe that the default representation for VarChar() fields in datasets went from uncompressed to compressed between 7.5.1 and 9.1.2, which would account for a smaller file at 9.1.2 despite having the same number of data records and identical contents.
XRAY
Participant
Posts: 33
Joined: Mon Apr 03, 2006 12:09 am

Post by XRAY »

Would the NLS and code page setting affect how the orchadmin work ?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It certainly could I would imagine... did you change it with your server migration?
-craig

"You can never have too many knives" -- Logan Nine Fingers
XRAY
Participant
Posts: 33
Joined: Mon Apr 03, 2006 12:09 am

Post by XRAY »

NLS of Datastage in both machine are both set to UTF-8

Linux locale are both set to
LANG=en_US.UTF-8
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK, so no. Your question implied that they were. In that case I would stick with the previous explanation.
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

For what it's worth, the migration guide section for migrating data sets leads to this technote:

How to move dataset from one server to another in IBM InfoSphere DataStage
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply