Was getting good results with just SD and occassional masking but it would take hours and hours to hone in and composite a complex scene with specific requirements & shapes (with most of the work spent currating the best outputs and then blending them into a scene with Gimp/Inkscape).
Masking is unintuitive compared to the scribble which gets similar effect; no need to paint masks (which is disruptive to the natural process of 'drawing' IMO) instead just make a general black and white outline of your scene. Simply dial up/down the conditioning strength to have it more tightly or fuzzily follow that outline.
You can also use Gimp's Threshold or Inkscape Trace Bitmap tool to get a decent black & white outline from an existing bitmap to expedite the scribble procedure.
In this post, we just seek to showcase the fastest way to do it - and how augmentation may potentially help vary the position!
Depends if you value this kind of freedom in life.
You can do some things such as colorizing black and white images with the Recolor model.
https://huggingface.co/stabilityai/control-lora