By Laserallan

2008-11-26 18:08:22 8 Comments

I'm building a distributed C++ application that needs to do lots of serialization and deserialization of simple data structures that's being passed between different processes and computers.

I'm not interested in serializing complex class hierarchies, but more of sending structures with a few simple members such as number, strings and data vectors. The data vectors can often be many megabytes large. I'm worried that text/xml-based ways of doing it is too slow and I really don't want to write this myself since problems like string encoding and number endianess can make it way more complicated than it looks on the surface.

I've been looking a bit at protocol buffers and boost.serialize. According to the documents protocol buffers seems to care much about performance. Boost seems somewhat more lightweight in the sense that you don't have an external language for specifying the data format which I find quite convenient for this particular project.

So my question comes down to this: does anyone know if the boost serialization is fast for the typical use case I described above?

Also if there are other libraries that might be right for this, I'd be happy to hear about them.


@gbjbaanb 2009-03-15 18:40:37

There's also Thrift, which looks like an alpha project but is used and developed by Facebook, so it has a few users of it.

Or good old DCE, which was the standard MS decided to use for COM. Its now open-source, 20 years too late, but better than never.

@Andy Dent 2013-02-13 06:26:30

Thrift is also used by the Evernote API

@Malkocoglu 2008-11-28 14:43:38

Also check out ONC-RPC (old SUN-RPC)

@David Allan Finch 2008-12-23 11:13:02

We have use sun rpc to do this for 15 year very successfully. The C++ code it produces is simple to use and it works on all the OS I have tried.

@Bill 2008-11-27 06:06:28

I would strongly suggest protocol buffers. They're incredibly simple to use, offer great performance, and take care of issues like endianness and backwards compatibility. To make it even more attractive, serialized data is language-independent thanks to numerous language implementations.

@Peter M 2008-11-26 19:31:50

If you are only sending well defined defined data structures, then perhaps you should be looking at ASN.1 as an encoding methodology ?

@grieve 2008-11-26 18:46:56

My guess is that boost is fast enough. I have used it in previous projects to serialize data to and from disk, and its performance never even came up as an issue.

My answer here talks about serialization in general, which may be helpful to you beyond which serialization library you choose to use.

Having said that, it looks like you know most of the main trouble spots with serialization (endianess string encoding). You did leave out versioning and forwards/backwards compatibility. If time is not critical I recommend writing your own serialization code. It is an enlightening experience, and the lessons you learn are invaluable. Though I will warn you it will tend to make you hate XML based protocols for their bloatedness. :)

Whichever path you choose good luck with your project.

@plinth 2008-11-26 18:32:13

Don't pre-emptively optimize. Measure first and optimize second.

@Laserallan 2008-11-26 19:33:55

I think this approach is good for things that are easily replaced. External libraries might not be that hard to change, but I definitely think it's worth doing some homework before making a decision if I can see a problem coming.

@unwesen 2008-11-26 18:29:28

boost.serialization doesn't care about string encodings or endianness. You'll be similarly well off not using it if that matters to you.

You might want to look into ICE from ZeroC:

It works similar to CORBA, except that it's entirely specced and defined by the company. The upside is that the implementations work as intended, since there aren't all that many. The downside is that if you're using a language they don't support, you're out of luck.

@David Rodríguez - dribeas 2008-11-26 18:32:48

Note the licensing terms: it is free for open source projects, but quite expensive for commercial applications (as most commercial CORBA ORBs anyway).

@unwesen 2008-11-26 18:43:33

Yes, it's expensive for commercial applications. If you factor out the hassle you'll invariably run into if you go for a CORBA or SOAP approach (mixing ORBs with different interpretations of the specs), the price is pretty good IMO :)

@Laserallan 2008-11-26 19:20:47

Good point about boost not supporting endianness. I thought the Data Portability goal on boost serialization index page implied that it actually was supported. However that point was mentioned in the todo-list for the project as well :)

@Statement 2008-12-04 14:41:09

From what I gather, boost serialization is portable as long you don't use the binary serialization technique. Appearantly both text and xml produces portable data. However, if you notice that disk space becomes an issue you might want to reconsider. I think there is a portable binary format underway

@Tim 2008-11-26 18:13:13

ACE and ACE TAO come to mind, but you might not like the size and scope of it.

Regarding your query about "fast" and boost. That is a subjective term and without knowing your requirements (throughput, etc) it is difficult to answer that for you. Not that I have any benchmarks for the boost stuff myself...

There are messaging layers you can use, but those are probably slower than boost. I'd say that you identified a good solution in boost, but I've only used ACE and other proprietary communications/messaging products.

Related Questions

Sponsored Content

25 Answered Questions

[SOLVED] What is the "-->" operator in C++?

15 Answered Questions

[SOLVED] What is the effect of extern "C" in C++?

1 Answered Questions

[SOLVED] The Definitive C++ Book Guide and List

  • 2008-12-23 05:23:56
  • grepsedawk
  • 2383888 View
  • 4243 Score
  • 1 Answer
  • Tags:   c++ c++-faq

40 Answered Questions

19 Answered Questions

[SOLVED] How can I profile C++ code running on Linux?

  • 2008-12-17 20:29:24
  • Gabriel Isenberg
  • 528245 View
  • 1830 Score
  • 19 Answer
  • Tags:   c++ linux profiling

10 Answered Questions

[SOLVED] Improve INSERT-per-second performance of SQLite

5 Answered Questions

1 Answered Questions

[SOLVED] C++ Serialization Performance

Sponsored Content