Dataset : size of data file change after orchadmin cp

XRAY · Post by **XRAY** » Wed Nov 12, 2014 11:05 pm

Hi All,

I have an issue with orchadmin cp.

Some datasets are migrated from an old sever and copies were made with orchadmin cp.

eg.
export APT_CONFIG_FILE=4n.apt
orchadmin cp A.ds B.ds

where 4n.apt is used to generate A.ds

The record count for both files are same , however it was found that the size of data file in B.ds and A.ds are different. Roughly, the size of A.ds is double of B.ds

ray.wurlod · Post by **ray.wurlod** » Wed Nov 12, 2014 11:21 pm

Was A.ds populated in one go, or by successive appends?

Looks like orchadmin cp has taken the chance to effect reduction of un-needed space.

ray.wurlod · Post by **ray.wurlod** » Wed Nov 12, 2014 11:25 pm

Why not look at the segment files (using the Data Set Management tool in Designer) to try to work out where the differences actually lie? In particular, review how many 32K or 128K blocks are involved for each Data Set.

Space in a Data Set includes any unused space in these storage blocks. It may be the case that the orchadmin cp command has been able to pack blocks more efficiently, particularly if A.ds has been appended to on occasions.

Appending to a Data Set does not re-use any internal storage blocks - it simply adds more blocks to the end of the segment file structure.

ArndW · Post by **ArndW** » Thu Nov 13, 2014 1:37 am

What versions were the two servers? The copy might have changed the string column compression attribute in the Datasets.

XRAY · Post by **XRAY** » Thu Nov 13, 2014 3:29 am

The original serer is 7.5.1
The new server is 9.1.2

The record count ( via dsrecords ) and the content of two dataset are just the same !

chulett · Post by **chulett** » Thu Nov 13, 2014 7:39 am

That's a good thing. In fact, all of this is a good thing... are you thinking there's a problem of some kind?

ArndW · Post by **ArndW** » Thu Nov 13, 2014 11:19 am

I wanted to check up at work, but forgot to do so. I do believe that the default representation for VarChar() fields in datasets went from uncompressed to compressed between 7.5.1 and 9.1.2, which would account for a smaller file at 9.1.2 despite having the same number of data records and identical contents.

XRAY · Post by **XRAY** » Thu Nov 13, 2014 9:27 pm

Would the NLS and code page setting affect how the orchadmin work ?

chulett · Post by **chulett** » Thu Nov 13, 2014 11:44 pm

It certainly could I would imagine... did you change it with your server migration?

XRAY · Post by **XRAY** » Fri Nov 14, 2014 9:24 pm

NLS of Datastage in both machine are both set to UTF-8

Linux locale are both set to
LANG=en_US.UTF-8

chulett · Post by **chulett** » Fri Nov 14, 2014 11:28 pm

OK, so no. Your question implied that they were. In that case I would stick with the previous explanation.

qt_ky · Post by **qt_ky** » Sat Nov 15, 2014 4:50 am

For what it's worth, the migration guide section for migrating data sets leads to this technote:

How to move dataset from one server to another in IBM InfoSphere DataStage