Page 1 of 1

Dataset : size of data file change after orchadmin cp

Posted: Wed Nov 12, 2014 11:05 pm
by XRAY
Hi All,

I have an issue with orchadmin cp.

Some datasets are migrated from an old sever and copies were made with orchadmin cp.


eg.
export APT_CONFIG_FILE=4n.apt
orchadmin cp A.ds B.ds

where 4n.apt is used to generate A.ds

The record count for both files are same , however it was found that the size of data file in B.ds and A.ds are different. Roughly, the size of A.ds is double of B.ds

Posted: Wed Nov 12, 2014 11:21 pm
by ray.wurlod
Was A.ds populated in one go, or by successive appends?

Looks like orchadmin cp has taken the chance to effect reduction of un-needed space.

Posted: Wed Nov 12, 2014 11:25 pm
by ray.wurlod
Why not look at the segment files (using the Data Set Management tool in Designer) to try to work out where the differences actually lie? In particular, review how many 32K or 128K blocks are involved for each Data Set.

Space in a Data Set includes any unused space in these storage blocks. It may be the case that the orchadmin cp command has been able to pack blocks more efficiently, particularly if A.ds has been appended to on occasions.

Appending to a Data Set does not re-use any internal storage blocks - it simply adds more blocks to the end of the segment file structure.

Posted: Thu Nov 13, 2014 1:37 am
by ArndW
What versions were the two servers? The copy might have changed the string column compression attribute in the Datasets.

Posted: Thu Nov 13, 2014 3:29 am
by XRAY
The original serer is 7.5.1
The new server is 9.1.2

The record count ( via dsrecords ) and the content of two dataset are just the same !

Posted: Thu Nov 13, 2014 7:39 am
by chulett
That's a good thing. In fact, all of this is a good thing... are you thinking there's a problem of some kind? :?

Posted: Thu Nov 13, 2014 11:19 am
by ArndW
I wanted to check up at work, but forgot to do so. I do believe that the default representation for VarChar() fields in datasets went from uncompressed to compressed between 7.5.1 and 9.1.2, which would account for a smaller file at 9.1.2 despite having the same number of data records and identical contents.

Posted: Thu Nov 13, 2014 9:27 pm
by XRAY
Would the NLS and code page setting affect how the orchadmin work ?

Posted: Thu Nov 13, 2014 11:44 pm
by chulett
It certainly could I would imagine... did you change it with your server migration?

Posted: Fri Nov 14, 2014 9:24 pm
by XRAY
NLS of Datastage in both machine are both set to UTF-8

Linux locale are both set to
LANG=en_US.UTF-8

Posted: Fri Nov 14, 2014 11:28 pm
by chulett
OK, so no. Your question implied that they were. In that case I would stick with the previous explanation.

Posted: Sat Nov 15, 2014 4:50 am
by qt_ky
For what it's worth, the migration guide section for migrating data sets leads to this technote:

How to move dataset from one server to another in IBM InfoSphere DataStage