|
Academic Open
Internet Journal |
Volume 11, 2004 |
Certain
Improvements In Marshalling
By
G Sudha Sadasivam1 and
Dr A Chitra2
1 Research Scholar
Department of CSE
PSG College of Technology
Coimbatore –641 004
Tamil Nadu
sudhasadhasivam@yahoo.com
Phone: +91-422-2572177
2Assistant Professor
Department of CSE
PSG College of Technology
Coimbatore – 641 004
Tamil Nadu
achitra@psgtech.cse.ac.in
Phone: +91-422-2572177
The
interaction between components and objects in a distributed environment should
be highly efficient and transparent to the application programmer. High
efficiency can be achieved by improving the inter-processor communication (IPC)
mechanism in micro kernels, while transparency can be achieved through
interface definition languages (IDLs).
Different encoding mechanisms like Extended Data Representation (XDR),
Network Data Representation (NDR) and Common Data Representation (CDR)
facilitate inter-component communication transparently and efficiently.
Marshalling procedures convert data in local machine representation into common
network representations. Common Object Request Broker Architecture (CORBA) uses
CDR representation to encode data. This paper proposes certain changes that can
be incorporated in the CDR encoding mechanism, to achieve better efficiency in
transmission. The changes include the following:
v A bit representation for the boolean
array.
v Removing data alignment at word
boundaries.
v Exact allocation of send and receive
buffer space depending on the data type being transmitted.
v Adopting inlining mechanism for some primitive
data types to improve efficiency.
Keywords: encoding, stub code, marshalling and efficiency.
The interaction between components
and objects in a distributed environment should be highly efficient and
transparent to the application programmer. High efficiency can be achieved by
improving the inter-processor communication (IPC) mechanism in micro kernels,
while transparency can be achieved through interface definition languages
(IDLs). Different encoding mechanisms
like Extended Data Representation (XDR), Network Data Representation (NDR) and
Common Data Representation (CDR) facilitate inter-component communication
transparently and efficiently. Marshalling procedures convert data in local
machine representation into common network representations. Common Object
Request Broker Architecture (CORBA) uses CDR representation to encode data.
This paper proposes certain changes that can be incorporated in the CDR
encoding mechanism, to achieve better efficiency in transmission. The changes
include the following:
v A bit representation for the boolean
array.
v Removing data alignment at word
boundaries.
v Exact allocation of send and receive
buffer space depending on the data type being transmitted.
v Adopting inlining mechanism for some
primitive data types to improve efficiency.
Keywords: encoding, stub code, marshalling and efficiency.
1. Introduction
The
marshalling or stub code efficiency is a very important factor in a distributed
environment. Efficient stubs are necessary to improve application performance.
As IPC mechanisms become faster, stub code efficiency is an important
performance issue for local client/server procedure calls and inter-component
communication in distributed systems.
Encoding schemes like XDR, NDR and CDR facilitate inter-component
communication efficiently and transparently.
An
IDL compiler generates stub code from the interface procedures. The stub code
marshals parameters on the client side, communicates through kernel primitives
with the server, unmarshals the parameters on the server side and invokes the
corresponding server procedure. The result returned from the server procedure
has to be marshaled back to the client. As a result, the programmer can specify
and use remote interfaces as easily as local interfaces. Portability and
adaptability are the important features of the stub code.
Sun’s
rpcgen [1,2] is an IDL compiler that converts interface specifications into
stub code. The stub code marshals data into XDR format, which is a standard
description and encoding of data. It is used to transfer data between different
computer architectures like Sun workstations, VAX, IBM-PC and Cray. It fits
into the ISO presentation layer. XDR uses a language to describe data, which is
used in Sun RPC. It assumes a byte or octet is portable. The hardware device
should encode the bytes into various media in such a way that other hardware
devices can decode the bytes without loss of meaning. Ethernet standard encodes
the bytes in “little-endian” format. The number of bytes that contain the
encoded data is in multiples of 4. If the data bytes are not in multiples of 4,
then it is padded with zeros. XDR lacks the following features:
v There is no representation for bit fields
and bit maps. It is based on bytes.
v There is no BCD representation
v Since there is only one byte ordering, it
cannot be used on certain machines.
v Some machines like Cray do not use 4-byte
alignment of data.
v XDR uses implicit data types. Even though
this avoids redundancy, only one representation of the data is possible.
Microsoft uses NDR [3] to encode data into a common network representation. MSIDL compiler generates stub code, which takes care of marshalling data into NDR format. It maps MSIDL data types into octet streams. Each primitive type in NDR has various data representations. For example, the character type can be represented in EBCDIC/ASCII format. The byte ordering can be little/big-endian format. NDR has a format label, which occupies 4 bytes. It gives the representation of integer, character and floating-point types used. So NDR supports multichannel approach to data conversion. It has a fixed set of alternative representations for data types. It can represent floating point suitable for IEEE, VAX, Cray and IBM machines. Integer and float can be big-endian and little-endian format. The character representation can be in ASCII/EBCDIC format. NDR label identifies the type of the representation for character, integer and float types. Like XDR, the data bytes are aligned in multiples of 4. So for primitive types it is padded with zero to achieve alignment.
General Inter-operable Protocol
(GIOP) [4] is used as the basic communication protocol in CORBA.
Commercial ORBs like Visibroker[6],
MICO [5] use Internet Interoperable Protocol (IIOP) [9] for communication
between distributed objects. IIOP implements GIOP specifications over TCP/IP.
The GIOP is intended to provide a protocol that fits and incorporates the
features of application, presentation and session layers in Open Systems
Interconnection (OSI) model. It aims at providing interoperability between
different ORBs. It has three core
elements – Message formats, CDR and complete IDL mapping.
a) Message Formats: Each message has a GIOP header and its byte
ordering. GIOP supports eight messages.
i) The following messages originate from
the client
·
Request
message to encode the object invocation from the client to the server
·
LocateRequest
message to obtain some information from the server like the validity of the
object reference (OR), state of the server etc.
·
CancelRequest
is sent by the client to the server to terminate a prior request.
ii) The following messages originate from
the server
·
Response
message is sent from the server to the client if reply is expected by it.
·
LocateReply
message responds to LocateRequest message.
·
CloseConnection
message informs the client that no response will be returned from the server.
iii) The messages supported by both
clients and servers include
·
The Error
Message is sent when a client or a server detects an error.
·
The Fragment
message is sent when a request or reply is broken into blocks that are sent
independently.
b) The Common Data Representation (CDR): The data representations in different
machines vary, since the machines have their own word byte ordering. So the
data must undergo some transformation process before transmission. This ensures
that both the transmitting and the receiving parties understand the data. CORBA
uses a neutral, bicanonical, on the wire representation of data called as CDR.
It is a data-formatting rule that allows variable byte ordering and support for
OMG’s IDL. CDR has the following features:
a)
Variable
byte ordering: The sender
sends the data in its own byte ordering. The receiver swaps this ordering to
have the data in the correct order for the receiver. Thus the client need not
know the details of the server machine architecture.
b)
Data
alignment: In CDR all
data is aligned at the word boundaries. CDR defines alignment policies for
primitive types. All complex types are broken into its constituent simple
types.
c)
Complete
IDL mapping: All data
types defined in the OMG IDL can be represented in CDR format. Primitive types
are encoded in multiples of octets. Complex types are built from primitive
types. Client data is transmitted as an octet stream of arbitrary length. It is
an abstract notation that specifies a memory buffer that is to be sent to
another process or machine over IPC or network. All data must undergo
marshalling before insertion into the octet stream. Marshalling involves
conversion of machine data into CDR format and then performing byte alignment
at the word boundaries.
CDR specifies the following
·
The layout
for little and big-endian formats for primitive types.
·
The layout
of complex types is based on the primitive types that comprise the complex data
type. Complex data types include structures, unions and arrays.
a)
Structure:
The encoding is based on the primitive types that comprise the structure. It is
encoded in the same order as declared in the IDL. The elements in the structure
must undergo alignment.
b)
Union: The
encoding of a union starts with the discriminant tag of the type specified in
the union declaration. It is followed by the encoding of the selected number.
c)
Arrays: An
array encodes its elements in sequence. The types of the elements in an array
determine it’s encoding. No encoding of the array lengths occurs since they are
given in the IDL
Table 1 gives a comparison of
the different data types in NDR, XDR and CDR formats
|
S.No. |
XDR (bytes) |
NDR (bytes) |
CDR (bytes) |
Description
|
|
1 |
boolean (1) |
boolean (1) |
boolean (1) |
An 8-bit value |
|
2 |
char(1) |
char(1) |
char (1) |
An 8-bit value |
|
3 |
_ |
_ |
octet (1) |
An 8-bit value with no marshalling |
|
|
|
small (1) |
- |
A 8-bit integer [ -27, 27-1] |
|
4 |
_ |
short (2) unsigned short
(2) |
short (2), unsigned short
(2) |
A 16-bit
integer [ -215, 215-1] A 16-bit
integer [ 0, 216-1] |
|
5 |
int (4) unsigned int
(4) |
long (4) unsigned long
(4) |
long (4) unsigned long
(4) |
A 32-bit
integer [ -231, 231-1] A 32-bit
integer [ 0, 232-1] |
|
6 |
hyper int (8) |
hyper int (8) |
long long (8) |
A 64-bit
integer [ -263, 263-1] |
|
7 |
unsigned hyper
int (8) |
unsigned hyper
int (8) |
unsigned long
long (8) |
A 64-bit
integer [ 0, 264-1] |
|
8 |
float (4) |
float (4) |
float (4) |
A 32-bit value |
|
9 |
double (8) |
double (8) |
double (8) |
A 64-bit value |
|
10 |
_ |
_ |
long double |
A 128-bit value
conforming to IEEE double-precision floating-point standard. |
|
11 |
_ |
_ |
Wchar (1,2,4) |
An 8-bit,
16-bit or 32-bit value that represents an international character data. |
|
S.No. |
XDR (bytes) |
NDR (bytes) |
CDR (bytes) |
Description
|
|
12 |
string
(multiple of 4 bytes) |
string
(varying/ conformant) |
String /wstring |
A string of
characters |
|
13 |
array size is
in multiple of 4 |
unidimensional/
multidimensional/ conformant arrays |
array size is
multiple of 4 and depends on the type of the array element |
Fixed length
arrays |
|
14 |
struct: each
component size is multiple of 4 |
struct :
alignment depends on the size of the largest component |
struct:
elements of struct undergoes alignment |
Structure |
|
15 |
union size =
discriminant size of 4 bytes and the
size of the largest case. |
union size =
discriminant size of 4 bytes and the
size of the largest case. |
union size =
discriminant size of 4 bytes and the
size of the selected case. |
Union |
|
16 |
void |
_ |
_ |
Zero byte |
|
17 |
const |
_ |
_ |
Symbolic name |
|
18 |
enum (4) |
enum (2) |
enum (2) |
Enumerated data
type |
|
19 |
Opaque |
- |
- |
Multiple of 4
bytes |
|
20 |
- |
Pipes |
- |
Ordered chunks |
2. Proposed Changes
The
following changes are proposed in the CDR marshalling format to minimize the
number of bytes occupied by the data and to improve the networking speed.
1) There is no alignment at the word
boundaries. If an operation has character and integer types, then it occupies 3
bytes compared to 4 bytes in standard CDR representation as shown in figure 1.
This also holds good for other primitive
data type combinations.
2) The boolean data type is represented as an octet. But a boolean array
is represented in bit format. For example, if the length of boolean array is
10, CDR format requires 10 bytes, whereas the proposed representation requires
2 bytes to represent the elements of the array in 10 bits as shown in figure 2.
This method is particularly beneficial when images are transmitted over the
network.
3) In CDR, the number of bytes required to
store the array elements is in multiple of four. In the proposed format, the
number of bytes required to store the array depends on the type of the elements
of the array. For example, in the case of an integer array of size 3, the number
of bytes required is 6, when compared to CDR representation, which requires 8
bytes. Similarly for a char array of size 5, the number of bytes required is 5
as apposed to 8 bytes required for CDR representation. This argument holds good
for all data types.
4) For structure, also the alignment at
word boundaries has been removed. The alignment of the structure shown below is
given in figure 3. It is seen that the proposed method occupies 10 bytes as
opposed to 12 bytes in CDR format.
struct mystruct {
long x;
short y;
long z;
};
4) Inlining of marshalling procedures in the stub code has also been
carried out for some data types. This is especially beneficial for short, long
and character, to improve the marshalling speed. The size of the stub code is also not increased very much for
these data types due to marshalling.
3. Results:
An IDL compiler has been designed to
generate the stub code. A performance analysis of the stub code, with the
proposed modifications, has been carried out. The measurements have been
carried out with Linux 7.1 as the operating system and 100 Mbps LAN. Pentium
III866 MHz computers with 128 MB RAM have been used. The round trip travel
time (RTT) which consists of the marshalling, unmarshalling and network transmission
time (both the ways) has been measured. The measurements are repeated and the
average is taken.
Figure 4a shows the performance of the stub code for the
different data types – char, short, float, float array of size 256, struct.
From the figure 4a it is seen that proposed method performs much better than
CDR representation.
Figure 4b shows the
RTT for Boolean array of various sizes for the proposed method and CDR. CDR,
XDR and NDR do not use bit representation for Boolean arrays. In the proposed
method, as each element of the Boolean array is encoded as a bit, its
performance is much better than CDR. It is also seen that as the size of the
array increases this difference in performance becomes more prominent.

Figure 4b:
RTT for Boolean array for the proposed method and CDR Figure 4a:
RTT for Proposed method, CDR and TCP/IP sockets

Figure 4c:
RTT of example data types for
inlining and compilation in the proposed method Figure 4d:
RTT of char array of different sizes for inlining and compilation methods
in the proposed method
The proposed method also adopts
inlining of procedures for some data types like integer, character and long. It
is found that the performance of the stub code is much better in the case of primitive types, but it is more
prominent in the case of composite data types like arrays and structures as
shown in figures 4c and 4d. In figure 4c, the combined interface consists of
five operations with various combinations of primitive data types like char,
int, long, float and double.
4. Conclusion:
Since the stub code generated by an IDL compiler
should be highly efficient to improve the application performance, this paper
proposes certain modifications to the standard encoding mechanism in CORBA,
namely CDR. They are
1)
Representation of boolean array in bit representation. This drastically
improves the stub code performance dependending upon the size of the array.
2) Alignment of
data at word boundaries has been removed. Due to this RTT for primitive types
is reduced. Difference in RTT for
composite data types like arrays and structures is highly prominent.
3) Inlining of
marshaling procedures has been carried out for int, char and long. This
marginally increases the stub code size (1%-2%), but the speed of the stub code
is highly increased. Thus the proposed method increases the speed of the stub
code without increasing its size.
References:
1) SunSoft Inc. Sunsoft
OMG IDL Compiler front end, Release 1.3,
March 1994, ftp://ftp.omg.org/pub/contrib/OMG-IDL-CFE1.3/
2)
R. Srinivasan, "RPC: remote procedure call specification, version 2",
Technical report RFC 1831, Sun microsystems institution, August 1995.
3)
“CDE 1.1 Remote Procedure Call – DCE 1.1 Remote Procedure Call – Transfer
Syntax NDR”, The Open Group, copyright 1997.
4) Object Management Group, CORBA/IIOP
specifications, OMG Document Number formal/2002-12-02, 2002.
5)
Puder and Römer, “MICO:
An Open Source CORBA Implementation”, Verlag
Heidelberg, Germany, 1998.
6) Visigenic Software Institution, “Visigenic
Reference Manual”, Ver 3.0, 1997.
7)Dr.A.Chitra
and G.Sudha Sadasivam, “Improving the performance of the IDL Compiler”, International
Conference on Digital Aided Modelling and Simulation DAMS 2003, Coimbatore
Institute of Technology, Jan 2003.
8)
Vishwajit. A, “Object Oriented Frameworks using C++ and CORBA”, Dreamtech
Press, New Delhi, 2000.
9)
William R., Thomas H and Paul, “IIOP Complete”, Addison Wesley,
Massachusetts, 1998.
Technical College - Bourgas,
All rights
reserved, © March, 2000