By Michel


2010-10-15 13:37:36 8 Comments

I'm about to create 100,000 objects in code. They are small ones, only with 2 or 3 properties. I'll put them in a generic list and when they are, I'll loop them and check value a and maybe update value b.

Is it faster/better to create these objects as class or as struct?

EDIT

a. The properties are value types (except the string i think?)

b. They might (we're not sure yet) have a validate method

EDIT 2

I was wondering: are objects on the heap and the stack processed equally by the garbage collector, or does that work different?

10 comments

@Eric Lippert 2010-10-15 14:24:14

Is it faster to create these objects as class or as struct?

You are the only person who can determine the answer to that question. Try it both ways, measure a meaningful, user-focused, relevant performance metric, and then you'll know whether the change has a meaningful effect on real users in relevant scenarios.

Structs consume less heap memory (because they are smaller and more easily compacted, not because they are "on the stack"). But they take longer to copy than a reference copy. I don't know what your performance metrics are for memory usage or speed; there's a tradeoff here and you're the person who knows what it is.

Is it better to create these objects as class or as struct?

Maybe class, maybe struct. As a rule of thumb: If the object is :
1. Small
2. Logically an immutable value
3. There's a lot of them
Then I'd consider making it a struct. Otherwise I'd stick with a reference type.

If you need to mutate some field of a struct it is usually better to build a constructor that returns an entire new struct with the field set correctly. That's perhaps slightly slower (measure it!) but logically much easier to reason about.

Are objects on the heap and the stack processed equally by the garbage collector?

No, they are not the same because objects on the stack are the roots of the collection. The garbage collector does not need to ever ask "is this thing on the stack alive?" because the answer to that question is always "Yes, it's on the stack". (Now, you can't rely on that to keep an object alive because the stack is an implementation detail. The jitter is allowed to introduce optimizations that, say, enregister what would normally be a stack value, and then it's never on the stack so the GC doesn't know that it is still alive. An enregistered object can have its descendents collected aggressively, as soon as the register holding onto it is not going to be read again.)

But the garbage collector does have to treat objects on the stack as alive, the same way that it treats any object known to be alive as alive. The object on the stack can refer to heap-allocated objects that need to be kept alive, so the GC has to treat stack objects like living heap-allocated objects for the purposes of determining the live set. But obviously they are not treated as "live objects" for the purposes of compacting the heap, because they're not on the heap in the first place.

Is that clear?

@Jon Hanna 2010-10-15 14:45:09

Eric, do you know if either the compiler or the jitter makes use of immutability (perhaps if enforced with readonly) to allow optimisations. I wouldn't let that affect a choice on mutability (I'm a nut for efficiency details in theory, but in practice my first move towards efficiency is always trying to have as simple a guarantee of correctness as I can and hence not have to waste CPU cycles and brain cycles on checks and edge-cases, and being appropriately mutable or immutable helps there), but it would counter any knee-jerk reaction to your saying immutability can be slower.

@Eric Lippert 2010-10-15 14:50:59

@Jon: The C# compiler optimizes const data but not readonly data. I do not know if the jit compiler performs any caching optimizations on readonly fields.

@Jon Hanna 2010-10-15 15:03:36

A pity, as I know knowledge of immutability allows for some optimisations, but hit limits of my theoretical knowledge at that point, but they're limits I'd love to stretch. In the meantime "it can be faster both ways, here's why, now test and find out which applies in this case" is useful to be able to say :)

@Nick Martyshchenko 2010-10-15 15:50:36

I would recommend to read simple-talk.com/dotnet/.net-framework/… and your own article (@Eric): blogs.msdn.com/b/ericlippert/archive/2010/09/30/… to start dive into details. There are many other good articles around. BTW, the difference in processing 100 000 small in-memory objects is hardly noticeable thru there some memory overhead (~2.3 MB) for class. It can be easily checked by simple test.

@Michel 2010-10-15 21:10:14

Yes, that's clear. Thanks very much for your comprehensive (extensive is better? Google translate gave 2 translations. I meant to say that you took the time to not write a short answer, but took the time to write all details too) answer.

@supercat 2010-10-15 14:33:34

A struct is, at its heart, nothing more nor less than an aggregation of fields. In .NET it's possible for a structure to "pretend" to be an object, and for each structure type .NET implicitly defines a heap object type with the same fields and methods which--being a heap object--will behave like an object. A variable which holds a reference to such a heap object ("boxed" structure) will exhibit reference semantics, but one which holds a struct directly is simply an aggregation of variables.

I think much of the struct-versus-class confusion stems from the fact that structures have two very different usage cases, which should have very different design guidelines, but the MS guidelines don't distinguish between them. Sometimes there is a need for something which behaves like an object; in that case, the MS guidelines are pretty reasonable, though the "16 byte limit" should probably be more like 24-32. Sometimes, however, what's needed is an aggregation of variables. A struct used for that purpose should simply consist of a bunch of public fields, and possibly an Equals override, ToString override, and IEquatable(itsType).Equals implementation. Structures which are used as aggregations of fields are not objects, and shouldn't pretend to be. From the structure's point of view, the meaning of field should be nothing more or less than "the last thing written to this field". Any additional meaning should be determined by the client code.

For example, if a variable-aggregating struct has members Minimum and Maximum, the struct itself should make no promise that Minimum <= Maximum. Code which receives such a structure as a parameter should behave as though it were passed separate Minimum and Maximum values. A requirement that Minimum be no greater than Maximum should be regarded like a requirement that a Minimum parameter be no greater than a separately-passed Maximum one.

A useful pattern to consider sometimes is to have an ExposedHolder<T> class defined something like:

class ExposedHolder<T>
{
  public T Value;
  ExposedHolder() { }
  ExposedHolder(T val) { Value = T; }
}

If one has a List<ExposedHolder<someStruct>>, where someStruct is a variable-aggregating struct, one may do things like myList[3].Value.someField += 7;, but giving myList[3].Value to other code will give it the contents of Value rather than giving it a means of altering it. By contrast, if one used a List<someStruct>, it would be necessary to use var temp=myList[3]; temp.someField += 7; myList[3] = temp;. If one used a mutable class type, exposing the contents of myList[3] to outside code would require copying all the fields to some other object. If one used an immutable class type, or an "object-style" struct, it would be necessary to construct a new instance which was like myList[3] except for someField which was different, and then store that new instance into the list.

One additional note: If you are storing a large number of similar things, it may be good to store them in possibly-nested arrays of structures, preferably trying to keep the size of each array between 1K and 64K or so. Arrays of structures are special, in that indexing one will yield a direct reference to a structure within, so one can say "a[12].x = 5;". Although one can define array-like objects, C# does not allow for them to share such syntax with arrays.

@Daniel Mošmondor 2010-10-15 17:31:00

Well, if you go with struct afterall, then get rid of string and use fixed size char or byte buffer.

That's re: performance.

@ja72 2010-10-15 15:16:32

Sometimes with struct you don't need to call the new() constructor, and directly assign the fields making it much faster that usual.

Example:

Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
    list[i].id = i;
    list[i].is_valid = true;
}

is about 2 to 3 times faster than

Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
    list[i] = new Value(i, true);
}

where Value is a struct with two fields (id and is_valid).

On the other hand is the items needs to be moved or selected value types all that copying is going to slow you down. To get the exact answer I suspect you have to profile your code and test it out.

@leppie 2010-10-15 15:18:15

+1 for showing a neat example

@leppie 2010-10-15 15:19:18

Obviously things get a lot faster when you marshal values over the native boundaries too.

@Michel 2010-10-15 21:02:38

+1 for the example

@supercat 2015-03-04 22:53:07

I'd suggest using a name other than list, given that the indicated code won't work with a List<Value>.

@ja72 2015-03-05 15:10:52

var list2 = new List<Value>(list) works just fine

@Robert 2010-10-15 14:41:00

From a c++ perspective I agree that it will be slower modifying a structs properties compared to a class. But I do think that they will be faster to read from due to the struct being allocated on the stack instead of the heap. Reading data from the heap requires more checks than from the stack.

@Paul Ruane 2010-10-15 13:44:36

Arrays of structs are represented on the heap in a contiguous block of memory, whereas an array of objects is represented as a contiguous block of references with the actual objects themselves elsewhere on the heap, thus requiring memory for both the objects and for their array references.

In this case, as you are placing them in a List<> (and a List<> is backed onto an array) it would be more efficient, memory-wise to use structs.

(Beware though, that large arrays will find their way on the Large Object Heap where, if their lifetime is long, may have an adverse affect on your process's memory management. Remember, also, that memory is not the only consideration.)

@leppie 2010-10-15 13:55:21

You are able to use ref keyword to deal with this.

@Jon Artus 2010-11-11 12:30:15

"Beware though, that large arrays will find their way on the Large Object Heap where, if their lifetime is long, may have an adverse affect on your process's memory management." - I'm not quite sure why you'd think that? Being allocated on the LOH won't cause any adverse effects on memory management unless (possibly) it's a short-lived object and you want to reclaim the memory quickly without waiting for a Gen 2 collection.

@Paul Ruane 2010-11-11 15:03:09

@Jon Artus: the LOH does not get compacted. Any long-lived object will divide the LOH into the area of free memory before and the area after. Contiguous memory is required for allocation and if these areas are not big enough for an allocation then more memory is allocated to the LOH (i.e. you will get LOH fragmentation).

@Jon Hanna 2010-10-15 13:59:30

If they have value semantics, then you should probably use a struct. If they have reference semantics, then you should probably use a class. There are exceptions, which mostly lean towards creating a class even when there are value semantics, but start from there.

As for your second edit, the GC only deals with the heap, but there is a lot more heap space than stack space, so putting things on the stack isn't always a win. Besides which, a list of struct-types and a list of class-types will be on the heap either way, so this is irrelevant in this case.

Edit:

I'm beginning to consider the term evil to be harmful. After all, making a class mutable is a bad idea if it's not actively needed, and I would not rule out ever using a mutable struct. It is a poor idea so often as to almost always be a bad idea though, but mostly it just doesn't coincide with value semantics so it just doesn't make sense to use a struct in the given case.

There can be reasonable exceptions with private nested structs, where all uses of that struct are hence restricted to a very limited scope. This doesn't apply here though.

Really, I think "it mutates so it's a bad stuct" is not much better than going on about the heap and the stack (which at least does have some performance impact, even if a frequently misrepresented one). "It mutates, so it quite likely doesn't make sense to consider it as having value semantics, so it's a bad struct" is only slightly different, but importantly so I think.

@FMM 2010-10-15 13:58:19

The best solution is to measure, measure again, then measure some more. There may be details of what you're doing that may make a simplified, easy answer like "use structs" or "use classes" difficult.

@Michel 2010-10-15 21:04:50

agree with the measure part, but in my opinion it was a straight forward and clear example, and i thought that maybe some generic things could be said about it. And as it turned out, some people did.

@kyndigs 2010-10-15 13:43:16

Structs may seem similar to classes, but there are important differences that you should be aware of. First of all, classes are reference types and structs are value types. By using structs, you can create objects that behave like the built-in types and enjoy their benefits as well.

When you call the New operator on a class, it will be allocated on the heap. However, when you instantiate a struct, it gets created on the stack. This will yield performance gains. Also, you will not be dealing with references to an instance of a struct as you would with classes. You will be working directly with the struct instance. Because of this, when passing a struct to a method, it's passed by value instead of as a reference.

More here:

http://msdn.microsoft.com/en-us/library/aa288471(VS.71).aspx

@Anthony Pegram 2010-10-15 13:47:23

I know it says it on MSDN, but MSDN is not telling the whole story. Stack vs. heap is an implementation detail and structs do not always go on the stack. For just one recent blog on this, see: blogs.msdn.com/b/ericlippert/archive/2010/09/30/…

@Paul Ruane 2010-10-15 13:56:56

"...it's passed by value..." both references and structs are passed by value (unless one uses 'ref') — it's whether a value or reference is being passed that differs, i.e. structs are passed value-by-value, class objects are passed reference-by-value and ref marked params pass reference-by-reference.

@Eric Lippert 2010-10-15 14:14:06

That article is misleading on several key points, and I've asked the MSDN team to revise or delete it.

@supercat 2010-10-15 14:53:18

@Eric Lippert: Would it be possible for you to encourage the use of more-distinct terminology for object instances (stored on the heap) and object references (stored in fields, variables, or wherever)? Also, with regard to "mutable structs are evil", it seems that mutable structs are mostly good, except for the places where temporary structs are created. Being able to change something, secure in the knowledge that nothing else is aliased to it, would seem useful ability. Sure one could clone class objects all over the place, but that would seem rather wasteful.

@CodesInChaos 2010-10-15 17:44:16

I think that mutable properties on a struct are OK (but not very nice) since the compiler usually catches assignments to properties of temporary copies, but mutating methods are definitely evil. If I need them for performance reasons I'd use a static method with a ref parameter instead of modifying this

@Eric Lippert 2010-10-15 19:34:50

@supercat: to address your first point: the larger point is that in managed code where a value or reference to a value is stored is largely irrelevant. We have worked hard to make a memory model that most of the time allows developers to allow the runtime to make smart storage decisions on their behalf. These distinctions matter very much when failure to understand them has crashing consequences as it does in C; not so much in C#.

@Eric Lippert 2010-10-15 19:36:46

@supercat: to address your second point, no mutable structs are mostly evil. For example, void M() { S s = new S(); s.Blah(); N(s); }. Refactor to: void DoBlah(S s) { s.Blah(); } void M( S s = new S(); DoBlah(s); N(s); }. That just introduced a bug because S is a mutable struct. Did you immediately see the bug? Or did the fact that S is a mutable struct hide the bug from you?

@supercat 2010-10-15 22:31:06

@Eric Lippert: I think many people who are used to by-value semantics in other languages get confused by something like "car2=car1; car2.color=blue;" affecting car1. If one thinks of car1 and car2 as holding VINs (vehicle IDs) rather than actual vehicles, the semantics make sense. A VIN doesn't have a color. The car represented by a VIN has a color. Saying "paint car 1G1KXQ58J green" doesn't mean one should paint the numbers green--it means one should find the car with that VIN and paint it. Saying "car2=car1" simply copies the VIN--not the car itself.

@supercat 2010-10-15 22:37:52

@Eric Lippert: In the latter case, the bug was immediately obvious; DoBlah needs to accept the structure by reference. There are some subtle bug cases, like methods which mutate a structure (evil), but suppose one needs to hold 1,000,000 items each with ten 16-bit parts, and it will often be necessary to change different combinations of half of those parts. Mutable structures would be pretty efficient. One copy operation on check-out, one on check-in. Non-mutable structures would seem to require making a copy for each edit unless one has many different 'change' functions.

@supercat 2010-10-15 22:40:21

@Eric Lippert: Besides, I consider a more common bug scenario to be what happens with mutable classes if e.g. someone forgets to clone an object before storing it in a Dictionary. That doesn't happen with structs. I tend to think that structs should only be mutable if they're Plain Old Data, but see nothing wrong with POD structs. (BTW, returning to your example, I'm assuming Blah() is an evil method which mutates the struct--I'll agree with you 100% in saying that methods which mutate structs are a bad idea).

@Preet Sangha 2010-10-15 13:40:58

Use classes.

On a general note. Why not update value b as you create them?

Related Questions

Sponsored Content

30 Answered Questions

[SOLVED] What are the differences between struct and class in C++?

16 Answered Questions

[SOLVED] Why are mutable structs “evil”?

28 Answered Questions

[SOLVED] When to use struct?

  • 2009-02-06 17:37:55
  • Alex Baranosky
  • 252710 View
  • 1322 Score
  • 28 Answer
  • Tags:   c# struct

24 Answered Questions

[SOLVED] When should you use a class vs a struct in C++?

  • 2008-09-10 16:29:54
  • Alan Hinchcliffe
  • 370101 View
  • 866 Score
  • 24 Answer
  • Tags:   c++ oop class struct ooad

16 Answered Questions

[SOLVED] Why Choose Struct Over Class?

12 Answered Questions

[SOLVED] typedef struct vs struct definitions

  • 2009-11-04 17:21:57
  • user69514
  • 628913 View
  • 742 Score
  • 12 Answer
  • Tags:   c struct typedef

18 Answered Questions

[SOLVED] What's the difference between struct and class in .NET?

8 Answered Questions

[SOLVED] Difference between 'struct' and 'typedef struct' in C++?

  • 2009-03-04 20:41:12
  • criddell
  • 479525 View
  • 781 Score
  • 8 Answer
  • Tags:   c++ struct typedef

5 Answered Questions

[SOLVED] Try-catch speeding up my code?

9 Answered Questions

[SOLVED] C# why need struct if class can cover it?

  • 2010-02-19 21:13:47
  • 5YrsLaterDBA
  • 5308 View
  • 16 Score
  • 9 Answer
  • Tags:   c# struct

Sponsored Content