Ask HN: Tips for maintaining a C codebase?

Wrap your libraries up in ADT-style interfaces.

Give every such interface a _create() and a _destroy() method.

Have _destroy() either return a pointer (which will always be NULL), or (better, I think) take a pointer-to-pointer so that it can zero the pointer out after destroying the object.

Don't check malloc; instead, rig your code up to detonate if malloc ever fails. Checked allocations create rats nests of error handling.

Have a common hash table, a common binary tree, and a common list or resizable array, working on void-star. Don't allow programmers do implement their own hash table or tree.

Have a common logging library, with debug levels.

euroclydon · 13 years ago

Thomas, You've mentioned that your new-programming-language-exercise du jour is to write a virtual machine (or microcontroller emulator), right? I'd like to try that in C. How would you go about writing this in a simple way. I know I could google this, but I'm worried I'll find results with too many details and pollute my discovery process.

Really, I'd just like some opinionated answers to the following from anyone who cares:

1) What is a simple instruction set that I can support?

2) What existing programs are available for that instructions set, or what set of tools are there to compile to that instructions set?

3) What are the high level tasks that I will need to accomplish to write this virtual machine?

4) What should I look up or borrow exclusively versus figure out for myself?

Thanks!

tptacek · 13 years ago

1) AVR

2) You'd compile simple C programs, of which there are zillions, rather than writing complex programs in assembly.

3) Make a struct that captures the state of the CPU: an array of integers for the register file, a flag word for the CPU flags, &c. Decode instructions (to a struct or something, which captures all the options of the instruction). Execute one instruction (pick it yourself): resolve its operands into temporary variables, execute the logic (almost invariably trivial), set the appropriate processor flags (overflow, zero, &c), and then store the result. Test lightly, and then repeat for all the other instructions. Most will be the same except for a single line of code. Make a big u_char array to represent memory. Write a HEX file loader, which will take a .hex file and populate memory with its contents; the GCC toolchain will compile C programs to HEX files. Now write the code to load an instruction, execute it, set the program counter appropriately, and repeat. Spend the next 2 weeks debugging.

4) I say, do it yourself. You can get yourself tied up in knots reading all the different ways to implement a VM. To start with, write a naive VM yourself with no help. Then go back and read that stuff if you want; it'll make much more sense.

cpncrunch · 13 years ago

If you don't mind I'll add my 2c here as well, since I've written my own compiler, parser and VM.

I just read a book about compiler construction (can't remember the name, but it's yellow - I can find out if you want) and that was all I needed.

It's actually a lot easier than you might expect to write a compiler for a C-like language (I added some object-oriented features, but it's basically C). I don't even have a CS degree. Although you can use tools like yacc and lex to help with the parsing, I decided to just do it myself. The compiler construction books tells you how to do this and it's fairly straightforward.

If you've done assembly language programming before you will already have a fair idea of what instructions you need. Just look at the z80 or 6502 or other simple cpu. I just used the stack for all operations (no registers) to keep things simple. Also I don't support all C features to keep things simple (no debugging, no pointers-to-pointers, all declarations must come before any definitions, etc).

The virtual machine is very easy to write - you basically just read in the object, do stuff like load string tables, then just have a loop reading each instruction and dispatch it using a function table. You probably also want to have a system library for doing stuff like accessing files, network i/o or whatever.

If you want to discuss further, let me know.

lukatmyshu · 13 years ago

As far as a common library, check out http://en.wikipedia.org/wiki/GLib

It has hash table, binary trees, linked lists/resizeable arrays.

tptacek · 13 years ago

I'm not personally a fan of GLib but I know lots of people who are better and more experienced C developers than I am that are; you could do a lot worse than to simply adopt GLib.

swah · 13 years ago

Normally you also recommend David Hanson's "C Interfaces and Imlpementations" which is a really great read.

tptacek · 13 years ago

Recommending a book seems like a cop-out when someone asks a specific question, but yeah, I still love that book a lot.

I also tried not to go over my existing code and recommend all the things I do personally, like creating a library-ized main() function and having an app-specific entrypoint, using arena allocators, &c.

shortlived · 13 years ago

We are not so good with sticking to ADT-style interfaces but we need to be. And the malloc check is great point. We've definitely gone down that road and it is hell.

Two projects you may want to review for ideas are Redis and toybox.

Redis comes to mind because it started out as largely a single file of code that has since been split and organized into multiple files. The code is quite approachable; you'll likely understand how most of it works after a day of causal browsing. http://redis.io

Toybox comes to mind because it's insanely modular, and aggressive about code re-use. The logic can feel a bit dense at times, but he's going for size and speed. I'm a big fan of Rob's efforts. http://landley.net/code/toybox/

shortlived · 13 years ago

>> redis

Thanks for this pointer. We also tend to be more organic in modularity. Start with a big chunk of code and split it when the time is right.

>> Toyboy, "insanely modular"

I haven't read the toybox code (so maybe it's not that bad) but we've had some folks go to extreme in the modularity/code-reuse direction. Their code is NOT easy to read.

shortlived · 13 years ago

On the subject of code reading: are there specific areas of the linux kernel (or minux or ...) that someone would recommend to read?

chas · 13 years ago

Reading through this book[1] was a really mind-expanding experience for me. It is an overview of the kernel and doesn't dive into any one part in depth, but it was extremely valuable for me in learning how to structure a large c project as the same techniques and ideas are useful for many large projects.

[1] http://www.amazon.com/Kernel-Development-Developers-Library-...