Log in

No account? Create an account

C++ geekery

So I'm trying to write up a solution to the homework assignment I just gave my students -- a binary search tree of (key,value) pairs in C++.

From my experience in Java and Scheme, I have a strong bias in favor of immutable data structures. So I wrote a BST class that presents a stateful "front end", but which is actually implemented in terms of immutable trees.

What should one be able to do with a BST-based dictionary? Obviously, create an empty one; insert a specified (key,value) pair; check whether a given key appears; look up the value associated with a given key; etc. I decided to leave out "remove" from the assignment, since that's harder to do to a pure BST (there are ways around it, but I didn't want to get that complicated). But it should be easy to replace the value associated with a given key.

Now, if the data structure were supposed to be mutable, I would just search through the BST, change the "value" field, and be done with it. If I were working in Scheme or Java, I would search through the BST, create a new node with the modified value but the same key and subtrees, and rebuild the answer from there on up; the old nodes will eventually get garbage-collected. So I try to do that in C++... but what happens to the old node? At some point it (and everything above it in the tree) will have to be deleted, or I'll get memory leaks. The obvious C++ destructor says "just before deleting me, delete both of my subtrees." But I just copied those subtree pointers to the new node, so they'll get clobbered.

Aha: I'll zero out those pointers before deleting the old node... but that counts as mutating, even though it's just a nanosecond before the node goes away completely. Maybe I should dumb-down the destructor so it DOESN'T automatically delete both subtrees; instead, I'll do that with an explicit call to a "cleanUp" member function before deleting. No, that doesn't work either because "cleanUp" counts as mutating, even though it's just before the node goes away completely.

The only alternative seems to be that I have to rebuild not only everything from the node-to-change UP, but also everything from that node DOWN. Which feels wasteful and inefficient -- exactly what C++ is NOT supposed to be.

Is it really true that in a non-garbage-collected language you can't have immutable data structures share substructures safely?

Side note: Oddly enough, the C++ compiler doesn't complain in the slightest about "delete this;" appearing in the middle of a member function.


In this case, I think you probably want to use std::auto_ptr. That breaks non-mutability, but it does solve your problem. auto_ptr is like a pointer, but deletes its object when it's deleted. The point where it break non-mutability is that if you make a copy, the original auto_ptr is set to point to null. (Yes, y = x; will modify x if they are auto_ptrs.)

Which, now that I read that, is rather how you suggested breaking it yourself. I'm just hiding the breaking in another class.

The problem that your running into is, lacking gc, somebody needs to know that they own the data, so they know to delete the data when they are deleted. You want to transfer ownership, which essential means you need to mutate your data. In a gc-based language, ownership isn't really a valid concept, so you can make your immutable data structures and pass them to other objects without having to change the ownership bits.

It gets even more fun if you have a graph instead of a tree, since then there can be multiple owners. Auto_ptr enforces one owner, so you need to jump threw hoops or start using reference-counted pointers, that delete the pointed to object when the reference count drops to zero. Those can still lead to memory leaks when two objects point to each other, but nothing else points to either object. To fix that, you need to write a garbage collector.

From the complete other side of the problem, I recently had to write code where I had about 1 Gb of memory and several Gb of data in Java. Writing a (limited) virtual memory system in a language with garbage collection (but no free or delete) is exciting.

Yes, I read something about auto_ptr, and thought it might help, but I haven't tried it yet.

I R ! C++ guru, obviously, but somebody's got to teach this course. Next week we switch from C++ to Prolog. I R ! Prolog guru either, but programming in it is more like walking through the desert than the Amazon rain forest.