A gripe about C# "boxing" and "unboxing"

 

Please don't get me wrong.  I think the .NET standard and the new C# language have improved on what Sun has started with the Virtual Machine and the Java Language, and would .NET applications would probably perform  better than Java (feel free to set me straight).  As good as COM was, .NET is undoubtedly an improvement. But let's not forget that .NET C# are not the same thing, and sometimes their somewhat hasty integration leaves something to be desired.

For those not "in the know", boxing, in C# refers to the mechanism by which a value type automatically gets an object "wrapper" whenever it is used in an object context.  This is all very nice, but Microsoft also says that in C#, everything derives from the base type Object.  Yes, ints, records/structures, and even the boolean type derives from Object.  Which begs the question, if all types derive from Object, then what's the purpose of boxing?

Let's get something straight first: C# is not DotNET, and vice-versa.  Microsoft made this clear in the specifications it submitted to ECMA.   However, DotNET's flagship language is C#.   With this in mind ,we can now focus on this fact: Boxing is a DotNET concept, as we can see from the C# snippet below:

      static void BoxMe()
      {
        int i = 123;
        object o = i;
      }

And its disassembly (ILDASM):

      .method private hidebysig static void BoxMe() il managed
      {
        // Code size       12 (0xc)
        .maxstack  1
        .locals (int32 V_0,
                 class System.Object V_1)
        IL_0000:  ldc.i4.s   123
        IL_0002:  stloc.0
        IL_0003:  ldloca.s   V_0
        IL_0005:  box        [mscorlib]System.Int32
        IL_000a:  stloc.1
        IL_000b:  ret
      } // end of method BoxTest::BoxMe

At IL_0005 we can plainly see a BOX instruction at the point of boxing.  We can see then that boxing is explicitly supported at the CIL level.  C#, however, does not necessarily need to expose this functionality, any more than a language such as C needs to expose any x86 machine instruction.  In fact, many would agree that to do so would be counterproductive.  It seems that in its early drafts, C# did indeed distinguish between value types (int, structs, boolean) and reference types (objects, interfaces), and boxing would be necessary.  However, in C#'s current form, Microsoft seems bent on selling its "object-ness", and hence  the "everything is an object" philosophy.  But boxing seems to have been inadvertently left in, a vestige, if you will, of its evolution.

Microsoft could simply have said that everything derives from Object, but that value types have different semantics and act in the "normal" way.  On the other hand, value types are so obviously different from reference types that few developers would really buy into the "an int is an object" idea.  For example, value types still live on the stack, while reference types, even those created for boxing, live on the heap.  In fact, believing that everything behaves like an object can be dangerous.  For one thing, value type assignment semantics are different from that of reference types. For example, if integers truly were objects, then statements like:

        int i, j;
        i = j;
        ...
        object o, p;
        o = p;

would have identical behavior, but they do not. Value type assignments still behave as expected, and not like reference types.  

So, in learning C#, how are we to think of value and reference types? Here are a some options:

For Visual Basic developers, the first option would probably make the most sense.  For Java and C++ developers, the second option is most similar to those languages.  And if you feel you're up to it, I recommend the last option, since DotNET really does have explicit object support for value types (not available in most 'real' processors) and understanding this is always an advantage. 


(c) 2002 emil santos

codexterity
ems ATSIGN codexterity PERIOD com