My team has a small but growing C library. We have a set of coding and naming conventions that are generally followed and things are fairly modular. Most of us are not C programmers though, so I want to continue reading and learning from other code to bring in useful ideas to our code base. Thanks in advance.
Give every such interface a _create() and a _destroy() method.
Have _destroy() either return a pointer (which will always be NULL), or (better, I think) take a pointer-to-pointer so that it can zero the pointer out after destroying the object.
Don't check malloc; instead, rig your code up to detonate if malloc ever fails. Checked allocations create rats nests of error handling.
Have a common hash table, a common binary tree, and a common list or resizable array, working on void-star. Don't allow programmers do implement their own hash table or tree.
Have a common logging library, with debug levels.
Really, I'd just like some opinionated answers to the following from anyone who cares:
1) What is a simple instruction set that I can support?
2) What existing programs are available for that instructions set, or what set of tools are there to compile to that instructions set?
3) What are the high level tasks that I will need to accomplish to write this virtual machine?
4) What should I look up or borrow exclusively versus figure out for myself?
Thanks!
2) You'd compile simple C programs, of which there are zillions, rather than writing complex programs in assembly.
3) Make a struct that captures the state of the CPU: an array of integers for the register file, a flag word for the CPU flags, &c. Decode instructions (to a struct or something, which captures all the options of the instruction). Execute one instruction (pick it yourself): resolve its operands into temporary variables, execute the logic (almost invariably trivial), set the appropriate processor flags (overflow, zero, &c), and then store the result. Test lightly, and then repeat for all the other instructions. Most will be the same except for a single line of code. Make a big u_char array to represent memory. Write a HEX file loader, which will take a .hex file and populate memory with its contents; the GCC toolchain will compile C programs to HEX files. Now write the code to load an instruction, execute it, set the program counter appropriately, and repeat. Spend the next 2 weeks debugging.
4) I say, do it yourself. You can get yourself tied up in knots reading all the different ways to implement a VM. To start with, write a naive VM yourself with no help. Then go back and read that stuff if you want; it'll make much more sense.
I just read a book about compiler construction (can't remember the name, but it's yellow - I can find out if you want) and that was all I needed.
It's actually a lot easier than you might expect to write a compiler for a C-like language (I added some object-oriented features, but it's basically C). I don't even have a CS degree. Although you can use tools like yacc and lex to help with the parsing, I decided to just do it myself. The compiler construction books tells you how to do this and it's fairly straightforward.
If you've done assembly language programming before you will already have a fair idea of what instructions you need. Just look at the z80 or 6502 or other simple cpu. I just used the stack for all operations (no registers) to keep things simple. Also I don't support all C features to keep things simple (no debugging, no pointers-to-pointers, all declarations must come before any definitions, etc).
The virtual machine is very easy to write - you basically just read in the object, do stuff like load string tables, then just have a loop reading each instruction and dispatch it using a function table. You probably also want to have a system library for doing stuff like accessing files, network i/o or whatever.
If you want to discuss further, let me know.
It has hash table, binary trees, linked lists/resizeable arrays.
I also tried not to go over my existing code and recommend all the things I do personally, like creating a library-ized main() function and having an app-specific entrypoint, using arena allocators, &c.
Redis comes to mind because it started out as largely a single file of code that has since been split and organized into multiple files. The code is quite approachable; you'll likely understand how most of it works after a day of causal browsing. http://redis.io
Toybox comes to mind because it's insanely modular, and aggressive about code re-use. The logic can feel a bit dense at times, but he's going for size and speed. I'm a big fan of Rob's efforts. http://landley.net/code/toybox/
Thanks for this pointer. We also tend to be more organic in modularity. Start with a big chunk of code and split it when the time is right.
>> Toyboy, "insanely modular"
I haven't read the toybox code (so maybe it's not that bad) but we've had some folks go to extreme in the modularity/code-reuse direction. Their code is NOT easy to read.
[1] http://www.amazon.com/Kernel-Development-Developers-Library-...