Academic Open Internet Journal

www.acadjournal.com

Volume 11, 2004

 

Certain Improvements In Marshalling

 

By

 

G Sudha Sadasivam1    and      Dr A Chitra2

 

1 Research Scholar

Department of CSE

PSG College of Technology

Coimbatore –641 004

Tamil Nadu

sudhasadhasivam@yahoo.com

Phone: +91-422-2572177

 

2Assistant Professor

Department of CSE

PSG College of Technology

Coimbatore – 641 004

Tamil Nadu

achitra@psgtech.cse.ac.in

Phone: +91-422-2572177

Abstract

       The interaction between components and objects in a distributed environment should be highly efficient and transparent to the application programmer. High efficiency can be achieved by improving the inter-processor communication (IPC) mechanism in micro kernels, while transparency can be achieved through interface definition languages (IDLs).  Different encoding mechanisms like Extended Data Representation (XDR), Network Data Representation (NDR) and Common Data Representation (CDR) facilitate inter-component communication transparently and efficiently. Marshalling procedures convert data in local machine representation into common network representations. Common Object Request Broker Architecture (CORBA) uses CDR representation to encode data. This paper proposes certain changes that can be incorporated in the CDR encoding mechanism, to achieve better efficiency in transmission. The changes include the following:

v   A bit representation for the boolean array.

v   Removing data alignment at word boundaries.

v   Exact allocation of send and receive buffer space depending on the data type being transmitted.

v   Adopting inlining mechanism for some primitive data types to improve efficiency.

 

Keywords: encoding, stub code, marshalling and efficiency.

 

 

Certain Improvements In Marshalling

 

Abstract

            The interaction between components and objects in a distributed environment should be highly efficient and transparent to the application programmer. High efficiency can be achieved by improving the inter-processor communication (IPC) mechanism in micro kernels, while transparency can be achieved through interface definition languages (IDLs).  Different encoding mechanisms like Extended Data Representation (XDR), Network Data Representation (NDR) and Common Data Representation (CDR) facilitate inter-component communication transparently and efficiently. Marshalling procedures convert data in local machine representation into common network representations. Common Object Request Broker Architecture (CORBA) uses CDR representation to encode data. This paper proposes certain changes that can be incorporated in the CDR encoding mechanism, to achieve better efficiency in transmission. The changes include the following:

v     A bit representation for the boolean array.

v     Removing data alignment at word boundaries.

v     Exact allocation of send and receive buffer space depending on the data type being transmitted.

v     Adopting inlining mechanism for some primitive data types to improve efficiency.

Keywords: encoding, stub code, marshalling and efficiency.

1. Introduction

            The marshalling or stub code efficiency is a very important factor in a distributed environment. Efficient stubs are necessary to improve application performance. As IPC mechanisms become faster, stub code efficiency is an important performance issue for local client/server procedure calls and inter-component communication in distributed systems.  Encoding schemes like XDR, NDR and CDR facilitate inter-component communication efficiently and transparently.

            An IDL compiler generates stub code from the interface procedures. The stub code marshals parameters on the client side, communicates through kernel primitives with the server, unmarshals the parameters on the server side and invokes the corresponding server procedure. The result returned from the server procedure has to be marshaled back to the client. As a result, the programmer can specify and use remote interfaces as easily as local interfaces. Portability and adaptability are the important features of the stub code.

            Sun’s rpcgen [1,2] is an IDL compiler that converts interface specifications into stub code. The stub code marshals data into XDR format, which is a standard description and encoding of data. It is used to transfer data between different computer architectures like Sun workstations, VAX, IBM-PC and Cray. It fits into the ISO presentation layer. XDR uses a language to describe data, which is used in Sun RPC. It assumes a byte or octet is portable. The hardware device should encode the bytes into various media in such a way that other hardware devices can decode the bytes without loss of meaning. Ethernet standard encodes the bytes in “little-endian” format. The number of bytes that contain the encoded data is in multiples of 4. If the data bytes are not in multiples of 4, then it is padded with zeros. XDR lacks the following features:

v     There is no representation for bit fields and bit maps. It is based on bytes.

v     There is no BCD representation

v     Since there is only one byte ordering, it cannot be used on certain machines.

v     Some machines like Cray do not use 4-byte alignment of data.

v     XDR uses implicit data types. Even though this avoids redundancy, only one representation of the data is possible.

   Microsoft uses NDR [3] to encode data into a common network representation. MSIDL compiler generates stub code, which takes care of marshalling data into NDR format. It maps MSIDL data types into octet streams. Each primitive type in NDR has various data representations. For example, the character type can be represented in EBCDIC/ASCII format. The byte ordering can be little/big-endian format. NDR has a format label, which occupies 4 bytes. It gives the representation of integer, character and floating-point types used. So NDR supports multichannel approach to data conversion. It has a fixed set of alternative representations for data types. It can represent floating point suitable for IEEE, VAX, Cray and IBM machines. Integer and float can be big-endian and little-endian format. The character representation can be in ASCII/EBCDIC format. NDR label identifies the type of the representation for character, integer and float types. Like XDR, the data bytes are aligned in multiples of 4. So for primitive types it is padded with zero to achieve alignment.

  General Inter-operable Protocol (GIOP) [4] is used as the basic communication protocol in CORBA. Commercial  ORBs like Visibroker[6], MICO [5] use Internet Interoperable Protocol (IIOP) [9] for communication between distributed objects. IIOP implements GIOP specifications over TCP/IP. The GIOP is intended to provide a protocol that fits and incorporates the features of application, presentation and session layers in Open Systems Interconnection (OSI) model. It aims at providing interoperability between different ORBs.  It has three core elements – Message formats, CDR and complete IDL mapping.

a) Message Formats:  Each message has a GIOP header and its byte ordering. GIOP supports eight messages.

i) The following messages originate from the client

·        Request message to encode the object invocation from the client to the server

·        LocateRequest message to obtain some information from the server like the validity of the object reference (OR), state of the server etc.

·        CancelRequest is sent by the client to the server to terminate a prior request.

ii) The following messages originate from the server

·        Response message is sent from the server to the client if reply is expected by it.

·        LocateReply message responds to LocateRequest message.

·        CloseConnection message informs the client that no response will be returned from the server.

iii) The messages supported by both clients and servers include

·        The Error Message is sent when a client or a server detects an error.

·        The Fragment message is sent when a request or reply is broken into blocks that are sent independently.

b) The Common Data Representation (CDR): The data representations in different machines vary, since the machines have their own word byte ordering. So the data must undergo some transformation process before transmission. This ensures that both the transmitting and the receiving parties understand the data. CORBA uses a neutral, bicanonical, on the wire representation of data called as CDR. It is a data-formatting rule that allows variable byte ordering and support for OMG’s IDL. CDR has the following features:

a)     Variable byte ordering: The sender sends the data in its own byte ordering. The receiver swaps this ordering to have the data in the correct order for the receiver. Thus the client need not know the details of the server machine architecture.

b)     Data alignment: In CDR all data is aligned at the word boundaries. CDR defines alignment policies for primitive types. All complex types are broken into its constituent simple types.

c)      Complete IDL mapping: All data types defined in the OMG IDL can be represented in CDR format. Primitive types are encoded in multiples of octets. Complex types are built from primitive types. Client data is transmitted as an octet stream of arbitrary length. It is an abstract notation that specifies a memory buffer that is to be sent to another process or machine over IPC or network. All data must undergo marshalling before insertion into the octet stream. Marshalling involves conversion of machine data into CDR format and then performing byte alignment at the word boundaries.

CDR specifies the following

·        The layout for little and big-endian formats for primitive types.

·        The layout of complex types is based on the primitive types that comprise the complex data type. Complex data types include structures, unions and arrays.

a)     Structure: The encoding is based on the primitive types that comprise the structure. It is encoded in the same order as declared in the IDL. The elements in the structure must undergo alignment.

b)     Union: The encoding of a union starts with the discriminant tag of the type specified in the union declaration. It is followed by the encoding of the selected number.

c)      Arrays: An array encodes its elements in sequence. The types of the elements in an array determine it’s encoding. No encoding of the array lengths occurs since they are given in the IDL

Table 1 gives a comparison of the different data types in NDR, XDR and CDR formats

Table 1: Data types in NDR, XDR and CDR formats

S.No.

XDR

(bytes)

NDR

(bytes)

CDR

(bytes)

Description

1

boolean (1)

boolean (1)

boolean (1)

An 8-bit value

2

char(1)

char(1)

char (1)

An 8-bit value

3

_

_

octet (1)

An 8-bit value with no marshalling

 

 

small (1)

-

A 8-bit integer  [ -27, 27-1]

4

_

short (2)

unsigned short (2)

short (2),

unsigned short (2)

A 16-bit integer  [ -215, 215-1]

A 16-bit integer  [ 0, 216-1]

5

int (4)

unsigned int (4)

long (4)

unsigned long (4)

long (4)

unsigned long (4)

A 32-bit integer  [ -231, 231-1]

A 32-bit integer  [ 0, 232-1]

6

hyper int (8)

hyper int (8)

long long (8)

A 64-bit integer  [ -263, 263-1]

7

unsigned hyper int (8)

unsigned hyper int (8)

unsigned long long (8)

A 64-bit integer  [ 0, 264-1]

8

float (4)

float (4)

float (4)

A 32-bit value

9

double (8)

double (8)

double (8)

A 64-bit value

10

_

_

long double

A 128-bit value conforming to IEEE double-precision floating-point standard.

11

_

_

Wchar  (1,2,4)

An 8-bit, 16-bit or 32-bit value that represents an international character data.

 

S.No.

XDR

(bytes)

NDR

(bytes)

CDR

(bytes)

Description

12

string (multiple of 4 bytes)

string (varying/ conformant)

String /wstring

A string of characters

13

array size is in multiple of 4

unidimensional/ multidimensional/ conformant arrays

array size is multiple of 4 and depends on the type of the array element

Fixed length arrays

14

struct: each component size is multiple of 4

struct : alignment depends on the size of the largest component

struct: elements of struct undergoes alignment

Structure

15

union size = discriminant size of  4 bytes and the size of the largest case.

union size = discriminant size of  4 bytes and the size of the largest case.

union size = discriminant size of  4 bytes and the size of the selected case.

Union

16

void

_

_

Zero byte

17

const

_

_

Symbolic name

18

enum (4)

enum (2)

enum (2)

Enumerated data type

19

Opaque

-

-

Multiple of 4 bytes

20

-

Pipes

-

Ordered chunks

2. Proposed Changes

            The following changes are proposed in the CDR marshalling format to minimize the number of bytes occupied by the data and to improve the networking speed.

1) There is no alignment at the word boundaries. If an operation has character and integer types, then it occupies 3 bytes compared to 4 bytes in standard CDR representation as shown in figure 1.

This also holds good for other primitive data type combinations.

 

 


2) The boolean data type is represented as an octet. But a boolean array is represented in bit format. For example, if the length of boolean array is 10, CDR format requires 10 bytes, whereas the proposed representation requires 2 bytes to represent the elements of the array in 10 bits as shown in figure 2. This method is particularly beneficial when images are transmitted over the network.

 

 

 

 

 


   

3) In CDR, the number of bytes required to store the array elements is in multiple of four. In the proposed format, the number of bytes required to store the array depends on the type of the elements of the array. For example, in the case of an integer array of size 3, the number of bytes required is 6, when compared to CDR representation, which requires 8 bytes. Similarly for a char array of size 5, the number of bytes required is 5 as apposed to 8 bytes required for CDR representation. This argument holds good for all data types.

4) For structure, also the alignment at word boundaries has been removed. The alignment of the structure shown below is given in figure 3. It is seen that the proposed method occupies 10 bytes as opposed to 12 bytes in CDR format.

struct mystruct {

     long x;

     short y;

     long z;

};

 

 

 

 

 


4) Inlining of marshalling procedures in the stub code has also been carried out for some data types. This is especially beneficial for short, long and character, to improve the marshalling speed.  The size of the stub code is also not increased very much for these data types due to marshalling.

3. Results:

      An IDL compiler has been designed to generate the stub code. A performance analysis of the stub code, with the proposed modifications, has been carried out. The measurements have been carried out with Linux 7.1 as the operating system and 100 Mbps LAN. Pentium III­866 MHz computers with 128 MB RAM have been used. The round trip travel time (RTT) which consists of the marshalling, unmarshalling and network transmission time (both the ways) has been measured. The measurements are repeated and the average is taken.

Figure 4a shows the performance of the stub code for the different data types – char, short, float, float array of size 256, struct. From the figure 4a it is seen that proposed method performs much better than CDR representation.

 Figure 4b shows the RTT for Boolean array of various sizes for the proposed method and CDR. CDR, XDR and NDR do not use bit representation for Boolean arrays. In the proposed method, as each element of the Boolean array is encoded as a bit, its performance is much better than CDR. It is also seen that as the size of the array increases this difference in performance becomes more prominent.

Figure 4b: RTT for Boolean array for the proposed method and CDR

 

Figure 4a: RTT for Proposed method, CDR and TCP/IP sockets

 
            

Figure 4c: RTT of example data types  for inlining and compilation in the proposed method

 

Figure 4d: RTT of char array of different sizes for inlining and compilation methods in the proposed method

 
                                                                                                                                                                                                                            

 

 

 

            The proposed method also adopts inlining of procedures for some data types like integer, character and long. It is found that the performance of the stub code is  much better in the case of primitive types, but it is more prominent in the case of composite data types like arrays and structures as shown in figures 4c and 4d. In figure 4c, the combined interface consists of five operations with various combinations of primitive data types like char, int, long, float and double.

4. Conclusion:

            Since the stub code generated by an IDL compiler should be highly efficient to improve the application performance, this paper proposes certain modifications to the standard encoding mechanism in CORBA, namely CDR. They are

1) Representation of boolean array in bit representation. This drastically improves the stub code performance dependending upon the size of the array.

2) Alignment of data at word boundaries has been removed. Due to this RTT for primitive types is reduced.  Difference in RTT for composite data types like arrays and structures is highly prominent.

3) Inlining of marshaling procedures has been carried out for int, char and long. This marginally increases the stub code size (1%-2%), but the speed of the stub code is highly increased. Thus the proposed method increases the speed of the stub code without increasing its size.

 

References:

1) SunSoft      Inc.      Sunsoft      OMG    IDL     Compiler front end, Release 1.3, March 1994, ftp://ftp.omg.org/pub/contrib/OMG-IDL-CFE1.3/

2) R. Srinivasan, "RPC: remote procedure call specification, version 2", Technical report RFC 1831, Sun microsystems institution, August 1995.

3) “CDE 1.1 Remote Procedure Call – DCE 1.1 Remote Procedure Call – Transfer Syntax NDR”, The Open Group, copyright 1997.

4) Object Management Group, CORBA/IIOP specifications, OMG Document Number formal/2002-12-02, 2002.

5) Puder and Römer, “MICO: An Open Source CORBA Implementation”, Verlag
Heidelberg
, Germany, 1998.

6)  Visigenic Software Institution, “Visigenic Reference Manual”, Ver 3.0, 1997.

7)Dr.A.Chitra and G.Sudha Sadasivam, “Improving the performance of the IDL Compiler”, International Conference on Digital Aided Modelling and Simulation DAMS 2003, Coimbatore Institute of Technology, Jan 2003.

8) Vishwajit. A, “Object Oriented Frameworks using C++ and CORBA”, Dreamtech Press, New Delhi, 2000.

9) William R., Thomas H and Paul, “IIOP Complete”, Addison Wesley, Massachusetts, 1998.

 

Technical College - Bourgas,

All rights reserved, © March, 2000