Straight TDD with spark is perfectly fine if you know what you're doing. I'm not saying it's easy or there's an easy guide somewhere, but it's possible.
If you're using Pyspark via the API, it's likely an incredibly important part of your process.
Our CICD platform and their owners get unhappy if we spawn an ad hoc spark session for testing purposes.
There is also a general expectation that unit tests are self contained and portable. So you could execute them in mac, linux, and arm ISA without much effort.
Another point was that we need to make this mocking or test setup easy because data scientist and ML Modellers are the most important persona who needs to write these tests ideally.
So mocking the data source with an abstraction layer and passing pandas dataframes, worked reasonably well for our use case.
When the tests pass, we can change from DuckDB to Spark. This helps decouple testing Spark pipelines from the SparkSession and infrastructure, which saves a lot of compute resources during the iteration process.
This setup requires an abstraction layer to make the SQL execution agnostic to platforms and to make the data sources mockable. We use the open source Fugue layer to define the business logic once, and have it be compatible with DuckDB and Spark.
It is also worth noting that FugueSQL will support warehouses like BigQuery and Snowflake in the near future as part of their roadmap. So in the future, you can unit test SQL logic, and then bring it to BigQuery/Snowflake when ready.
For more information, there is this talk on PyData NYC (SQL testing part): https://www.youtube.com/watch?v=yQHksEh1GCs&t=1766s
Fugue project repo: https://github.com/fugue-project/fugue/
Being able to know the true health of a service is an absolute godsend.
So many times a service had been dead or gone for hours before anyone noticed (well our customers noticed, but it has to funnel up the pipeline from customer, to support, to engineering) before we were made aware of a real issue.
Nothing says good PR like not knowing you've been dead in the water for half a day and have no idea.
This also holds for services that have internal clients. In other words, if your output is consumed only by other services in the same company, the same high monitoring standards must apply. Otherwise failure detection becomes very delayed and the productivity of many teams gets affected. There is no worse buzzkill than explaining other service owners what is wrong with their application.
One other important lesson we have earned is that alerts require time to mature. The thresholds need to be trained, the alert formulation needs to be revised. Our alerts usually give couple of false positives in the first two weeks of their creation. During these two weeks we frequently improve the conditions of alerts.