Lerp Development Log:
Program Overhaul to Prepare for Indexing
14 February 2006


Until now, the entire Lerp codebase used singly-linked lists to store all database entries.  Every member of a struct (even individual elements of arrays) was stored in such a list.  This meant the interpreter was spending huge amounts of time just walking all these lists all the time to find things.  (The Lerp IDE shown in earlier updates ran at about 14 frames per second... not very fast!)

That doesn't sound so great, but it was actually according to plan.  Writing a language system like this is a big project and you have to start somewhere.  The "all databases are linearly linked lists" approach was just meant as a temporary solution until the time came to replace it.

Well, that time has come, so I spent the past few days restructuring the code so that databases can be indexed in various manners specified by the user (but queries don't care about the data storage format).  This sounds simple but it's actually fairly complicated, given all the ways the queries interact.  This was definitely an "I am now going to totally break the codebase and it will not run for several days" kind of change.

I've finally gotten the IDE running again.  In the process I noticed two things:

1) There are these things I call "indirect memory problems", and I am way, way better at debugging them than I used to be.  In these situations some piece of memory (like a pointer) doesn't get set properly, but the place where the corresponding crash-or-assert happens is far away from the code that actually made the memory mistake -- and the location of the cause is not readily apparent from the crash.  This happens all the time in an interpreter since you are just chewing through bytecode from a main loop, and one instruction in your bytecode stream might set something up, and then you use it 10 instructions later, and then you see it's wrong -- well what happened exactly?  Who knows.  I had several of those problems this time but I nailed them all with minimal fuss.  If you have a clear enough understanding of the system, you pretty much know what the few causes of some problem could be, and you can quickly enumerate and test the possibilities.  (Another example of this kind of error is the fearsome garbage collector bug: hey, garbage collection arbitrarily messed up some data structures and you don't know until some later, disassociated part of your program tries to use them.  I had a few of those too.)

1a) Consequently, having a clear understanding of your system is very important.  If you have been programming/designing in the same style for a long time, and have become comfortable and concise with it, this is a huge help.  Prior to this week, I hadn't touched the code for Lerp in almost a year.  It's complicated, there's a lot going on, and it was intimidating to get back into it for the first few hours.  But after that it all flowed -- I understood what was happening and what to do, because the code works the way my mind works.  It's all in my style and that style is (to me) very straightforward and simple.  There is no bullshit.  It was very easy to get back into it and cruise, like riding a bike.

2) Do not underestimate the power of statically-checked languages for large projects.  In such languages, the compiler is very much your friend.  It enables you to do huge overhauls that would be much more difficult in a dynamic language.  It works like this: you rip out all the old code, change whichever interfaces and data structures you want.  Now the program doesn't link any more and gives you a ton of compile-time errors.  But this is good -- this is how the compiler keeps track of all the tiny details for you, which functions you still need to write, which code you need to change to conform to new data structures, etc. You don't need to think about any of that stuff at all, or maintain to-do lists, or anything.  The compiler does this record-keeping for you, freeing you to think more about the actual designs and implementations.  To proceed, you just start at the top of the error list and fix the errors, one by one.  When you're back down to 0 errors, you have done everything the compiler was reminding you about.  And if you were perfect your program will now run.  (Of course in a system this complicated you probably made some bugs that you now have to fix -- but it's way, way fewer bugs than you would have in a dynamic language).

2a) Dynamic Language Advocates will often talk about dynamic systems being great because they give you some Cool Paradigm that you don't have with static languages (like playing around in a read-eval-print loop, or something).  But they don't seem to understand that static languages offer important paradigms, like 2) above, that are very useful for Getting Things Done.  These things may be subtle, but they are still very valuable.  The dynamic languages throw away this subtle, valuable thing for something more obvious but less valuable -- and they think they are winning.  (There are other, even more subtle things, like the confidence the programmer has in individual changes... but I won't go into those here.)  I am much more powerful as a programmer because I can rip out core parts of the code, yanking out tendrils that reach through the whole codebase (i.e. "lots of crosscutting concerns"), and I can do this without fear -- it's no big deal, everything will be back to normal in a few days.  In a dynamic language, I would be so much more worried about this kind of change that I would do it rarely or never (I would be thinking, "ohmygod I am about to fuck up the entire code base, this is going to be a disaster.") Fear of change can be very bad -- it encourages code rot, which means your code slides slowly into unmaintainability.


Of course, right now Lerp is a dynamically-checked language.  That is also a short-term solution -- in the end I want it to be as statically-checked as possible.  But that takes a lot more work than just implementing the core functionality of the language, and it is that latter bit that I am focused on now.
 

Anyway, now the IDE runs slowly just like before, because all the databases are still just linked lists.  But they are abstracted in such a way now that I can drop indexing straight in.  We'll see how it goes.

 

[Back to Lerp Development Log]