Updating libxmljs on node.js: or, how to suck less at Google v8
06-17-2010 5:20 AM permalinkI just released the latest version of libxmljs for node.js http://github.com/polotek/libxmljs/tree/0.4.0. This is a major release that doesn't have many visible changes but lots of stuff happened under the hood. I figured it'd be good to describe some of my struggles with Google v8
Debugging Segmentation Faults
Previous versions of libxmljs were rife with segfaults that would pop up without warning. There are lots of reasons for this, but the most common had to do with not following best practices with the v8 API. V8 has this notion of Handles that are used to hold references to javascript objects in C/C++ space. Among other things, this design decision gives v8 a nice way to keep track of js references and clean them up at the appropriate time. A contrived example (assume v8 namespace).
Handle<Number>
function Handle<Number>(int mynum) {
Handle<Number> jsNum = Number::New(mynum);
return jsNum;
}
But there's a problem with the above example, and there was a lot of this in earlier versions of libxmljs. How does this number get clean up when you're done with it? V8 recommends using a HandleScope to track local Handles within functions. That way you can be sure they get cleaned up along with the function scope.
Handle<Number>
function Handle<Number>(int mynum) {
HandleScope scope;
Handle<Number> jsNum = Number::New(mynum);
return jsNum;
}
Ah, now your number will get cleaned up properly. But wait... your number will get cleaned up properly! That means that after this function exits, that Number is gone. Poof. Cleaned up by v8. And if you try to use it, you get a segfault. Version 0.3.x has this incomplete implementation and I got lots of awesome bugs from the nice folks in the node.js community.
So how do you properly use Handles but keep your returned object from getting freed? You close the scope on the object. Basically this causes the local scope to pass the Handle to the calling scope. Essentially let somebody else worry about it.
Handle<Number>
function getJSNumber(int mynum) {
HandleScope scope;
Handle<Number> jsNum = Number::New(mynum);
return scope.Close(jsNum);
}
This ensures that your Handle will stay viable until you pass it back to javascript space and use it. And when you're done it'll get cleaned up as normal (e.g. when your script ends or you lose the reference to the number and it's garbage collected). I had to find every instance of these unclosed scopes and apply them properly throughout libxmljs. Pretty simple really, but if you're not aware of this, it'll cause you all sorts of headaches.
Lesson: Use HandleScope inside any function dealing with v8 Handles and Close every Handle that gets returned from the function. This includes just returning primitive constructors.
Handle<Number>
function getJSNumber(int mynum) {
HandleScope scope;
return Number::New(mynum); // wrong! Number::New still
// returns a Handle!
}
Making Sure Libxmljs Doesn't Devour All of Your Memory
This is a bigger deal and the real reason 0.4.0 took so long. In earlier versions of libxmljs, the xml documents would never actually get cleaned up. I had people creating huge xml/html docs in tight loops over and over again. And they would watch the process memory climb until it was exhausted and then their app would crash. Bad juju. I know what you're thinking; "But javascript has garbage colleciton. What is this moron doing wrong". An excellent question. Let's explore.
V8 does in fact have a very nifty garbage collector and it works quite well. The first problem, however, is that it's pretty lazy. It doesn't run on a regular basis. In fact it only runs when it damn well feels like it. GC is expensive so the v8 team put in some fancy heuristics for determining when is a good time to do it. At a high level, this boils down to "only GC if needed". So unless v8 sees that you're using a lot of memory, the GC won't run.
So if I'm creating giant xml documents and then they're passing out of scope, why wouldn't v8 consider this a good time to take out the trash? Well that's because of the second problem. V8 has no idea that the xml document memory exists. You see libxmljs is literally a binding around the libxml2 library. The vast majority of the memory allocated by this library is done by libxml. The binding itself just provides a nice API for js to get at that xml memory. Now's a good time for a diagram:
Hopefully this diagram illustrates the problem. Only the memory in the orange and green boxes are managed by v8. But these are just lightweight pointer classes around the c structs managed by libxml. So even when you're building up 100's of MBs or even GBs or memory, v8 was only seeing a small percentage of that.
The way to fix this was fairly straight-foward. V8 gives you away to tell the GC to take that outside memory into account. The function is called V8::AdjustAmountOfExternalAllocatedMemory, and just like it sounds, it gives us the ability to increment or decrement the amount of external memory that v8 is tracking. Libxmljs now uses this function whenever new nodes are allocated or deallocated. And the result is that the GC runs more often and cleans out your memory.
The great thing about this upgrade is that it works within the v8 garbage collection heuristics. If you're not usuing a lot of memory, you won't see much GC overhead. If you are, and libxmljs is devouring your RAM, v8 will kick in and get it back to you.
One Annoyance To Be Aware Of
I hope this info is useful to folks having trouble with node addons. For those using libxmljs, there are a few side effects that look worrying but really aren't. I had to enable the custom memory management functions in libxml2 in order to be aware of how much memory is being used at any given time. For some reason this causes some random error messages to be printed whe the node process exits. It looks something like this:
Memory tag error occurs :0x100701788
bye
xmlMemFree(1007017B0) error
xmlMallocBreakpoint reached on block 0
Memory tag error occurs :0x100701798
bye
xmlMemFree(1007017C0) error
xmlMallocBreakpoint reached on block 0
Memory tag error occurs :0x100701418
bye
xmlMemFree(100701440) error
xmlMallocBreakpoint reached on block 0
It's annoying, but can be safely ignored.

11 Responses to this Article
leave a comment
Hi,
Thanks for the update. I tried to compile the latest version on Mac OSx, but I am not able to. I have posted the error message here
http://github.com/polotek/libxmljs/issues#issue/14
It would be nice if you could let me know how to fix it. Thanks!
Thanks for your article, V8 docs is very simple for developers.
Thanks for the explanation of the warnings. But is there a way to turn off by default?
Дикий гусь тушеный - взято с www.fisherhunter.ru/kyhnia/bluda-iz-pticy/bluda-iz-dikogo-gysia/dikii-gys-tyshenyi.html
сами смотрите
Pretty late to the party, but the second code excerpt seems wrong. It doesn't differ from the first in any way other than the fact that you've added the HandleScope; but it's not used, and so the cleanup behaviour relating to jsNum surely won't change?
@Arlen No you have to know a little more about Handles and scopes. A HandleScope is a construct in v8 that explicitly manages the lifetime of js Handles. A Handle always belongs to a scope. If you don't create a scope it defaults to the root scope. By creating a scope in the function, all Handle creation is automatically tracked by that scope under the covers. I know it seems odd, I'm still not sure about the C++ magic myself. But when the function ends and that scope is deallocated, it'll also make sure any Handles created within this scope get cleaned up. Unless you close them and return them down the stack.
Gingko Biloba is a well known remedy for iimorvpng circulation all over the body. Better circulation in the brain is what should improve memory and mental function. In males, it can be used as a kind of aphrodisiac, since better circulation in the genitals in some cases will be seen as a better erection. Gingko should be used for three months in a row, then should not be taken for another three months and so on. I wouldn't not know about gingko and breastfeeding, though.
Gingko Biloba is a well known remedy for iimorvpng circulation all over the body. Better circulation in the brain is what should improve memory and mental function. In males, it can be used as a kind of aphrodisiac, since better circulation in the genitals in some cases will be seen as a better erection. Gingko should be used for three months in a row, then should not be taken for another three months and so on. I wouldn't not know about gingko and breastfeeding, though.
Gingko Biloba is a well known remedy for iimorvpng circulation all over the body. Better circulation in the brain is what should improve memory and mental function. In males, it can be used as a kind of aphrodisiac, since better circulation in the genitals in some cases will be seen as a better erection. Gingko should be used for three months in a row, then should not be taken for another three months and so on. I wouldn't not know about gingko and breastfeeding, though.
Hello! fcegaae interesting fcegaae site! I'm really like it! Very, very fcegaae good!
Very nice site!
Add Your Comment