What has thus far remained a mystery to me is going from a bag of noisy pixels with a blurry photo of a tattoo on a hairy arm surrounded by random desk clutter to array of booleans. I meant “by hand” as in “without libraries”, not “using a human”, as in the latter case the human’s visual cortex does the interesting part! And the open-source Android apps that I’ve looked at just wrap ZXing, which is huge (so a sibling commenter’s suggestion of looking at a different, QR-code-specific library is helpful).
But in general, you can divide the problem more or less like this (not necessarily in this order) 1. find the rough spatial region of the barcode. Crop that out and only focus on this 2. Correct ("rectify") for any rotation or perspective skew of the barcode, turn it into a frontoparallel version of the barcode 3. Binarize the image from RGB or grayscale into pure black and white 4. Normalize the size so that each pixel is the smallest spatial unit of the barcode.
As for implementation, Zxing-cpp [1] is still maintained, and pretty good as far as open source options go. At this point I'm not sure how related it is to the original zxing, as it has gone substantial development. It has python bindings which may be easier to use.
On mobile, Google MLkit and Apple vision also have barcode reading APIs, not open source but otherwise "free" as in beer.
Checkpointing the model every N iterations/epochs/batches has a similar problem - you may end up saving very few checkpoints and risk losing work or waste a lot of time/space with lots of checkpoints.
So I've often found myself implementing some kind of monitoring and checkpointing callbacks based on time, e.g., reporting every half an hour, checkpointing every two hours, etc.
Many many python image-processing libraries have an `imread()` function. I didn't know about this when designing our own bespoke image-lib at work, and went with an esoteric `image_get()` that I never bothered to refactor.
When I ask ChatGPT for help writing one-off scripts using the internal library I often forget to give it more context than just `import mylib` at the top, and it almost always defaults to `mylib.imread()`.