I have a 15gb parquet file in a s3 bucket and I need to "unzip" and extract every row from the file to write into my database. The contents of the file are emails and I need to integrate them into our search function.
Is this possible to do without an unreasonable amount of RAM? Are there any affordable services that can help here?
Feel free to contact me (email in bio), happy to pay for a consult at the minimum.
Maybe it was some one else later who wrote that.
On rare occassions, I still kind of do it.
Anything that tries to get them to understand the risks they are taking or the sensitivity of the data, much less de-risk their workflows, is treated as an obstacle to be routed around. Often, the best I can hope for is a token effort at negotiation where their goal will be to avoid any and all changes on their part. After which I will have to monitor them carefully, because from experience the odds of them backsliding within a week are uncomfortably high.
Nothing about this is conducive to producing a healthy environment. When people's idea of "easy" is they can download the company's most sensitive data to their laptop to load into Jupyter, any amount of security controls will come as an imposition.
Given that many data engineers have a data science, data analytics, BI, or software engineering background, I'm curious if you've noticed any trends in their approach to data security?
I also asked Gemini about git and that didn't go well either.
https://www.researchgate.net/publication/355022506_Marlon_Br...