At Intent HQ we are working in a close relationship with our clients to help them have a better understanding of their data so they are able to provide better services to their own customers. We are ~80 people from all over the world (we speak 15 different languages!) based in London, Barcelona and NYC. As we are small, we love sharing ideas and really like to work along the principles of valuing 'individuals and interactions', 'customer collaboration', 'responding to change' and 'working software'.
Tech stack: Scala, Typelevel stack (cats-effect, htt4s, fs2...), Cassandra, Elasticsearch, PostgreSQL, Kafka, Docker, Nomad, Terraform, Consul, Vault, AWS, TypeScript, React, Redux
We have several positions open: https://intenthq.teamtailor.com/
Salary ranges depend on location.
If you want more information feel free to drop me an email: albert (at) intenthq.com
Not that this excuses GitHub/Microsoft in any way, this was an obvious outcome and they're morally and legally responsible.
So, I'd go for what others have already mentioned. Learn about the person, how they are, how they learn better, what are their interests.
Find something they like and enjoy, so they are motivated in learning.
Be ready and available to answer questions and adapt to their rhythm and needs.
Prepare materials and different options so they can chose their path as they go.
Facilitate them changing their mind, going back and forth, making their own mistakes.
In my case, for example, I learn by doing, and pair programing with somebody helps me a lot, but other people might prefer having a theoretical background first and will want to read a book before diving into coding.
EDIT: adding paragraphs for clarity
This tool will not provide any significant amount of anonymity.
> rows to randomly sample ... hash (using ... 32 bits) the column ... mod the result by the [constant] value
This is not random. It deterministically selects the same very predictable fraction of rows.
> UK format postcode (eg. W1W 8BE) and just keeps the outcode (eg. W1W)
> Given a date, just keep the year
Partial postal codes and dates quantized to the year are still very revealing. Combined with other data (such as a hashed name), the partial postal code may allow a lot of people to be uniquely identified.
> Hash (SHA1) the input
Hashing does not provide anonymity. Substituting a candidate key with the hash of the key is usually a 1-to-1 map that is often trivial to reverse. It isn't hard to iterate through e.g. all possible names, postal codes, license plates, or other short-ish strings to find a matching SHA1.
https://arstechnica.com/tech-policy/2014/06/poorly-anonymize...
The salt
might* provide some resistance to per-computed tables, but a GeForce GTX 1080 Ti running hashcat can search for matching SHA1 at over 11 GH/s (giga-hashes per second). That means that a single 1080 Ti running for ~3-4 hours would not only discover not only that SHA1("hasselhof") == ffe3294fad149c2dd3579cb864a1aebb2201f38d; it would exhaustively search all 10 character or smaller lowercase strings.> range
This is the only feature that could provide anonymity, if it is used correctly to group large numbers of individuals into the same bucket. This is probably more difficult that it first appears.
>> rows to randomly sample ... hash (using ... 32 bits) the column ... mod the result by the [constant] value
> This is not random. It deterministically selects the same very predictable fraction of rows.
Yep, you are right. We didn't intend the sampling function to be part of the anonymisation but just something we tend to use and we thought it would be useful to have it.
Its objective is to pick a portion of the input data. No more.
>> UK format postcode (eg. W1W 8BE) and just keeps the outcode (eg. W1W)
>> Given a date, just keep the year
> Partial postal codes and dates quantized to the year are still very revealing. Combined with other data (such as a hashed name), the partial postal code may allow a lot of people to be uniquely identified.
You are absolutely right. Depending on the use case and your data, having the outcode, the city or the year might be very revealing. In some other cases even having decades or centuries might be revealing.
We don't pretend that each function provided applies to all use cases. But in certain use cases partial postcodes or years can be good enough.
>> Hash (SHA1) the input
> Hashing does not provide anonymity.
We are very aware of that. That's why we offer the option to add a salt (that the user of the tool can make as long as possible and throw away after the anonymisation process).
>> range
> This is the only feature that could provide anonymity, if it is used correctly to group large numbers of individuals into the same bucket. This is probably more difficult that it first appears.
We usually work with sets of data that are tens of millions of users. Choosing the right ranges and, specially, analysing the data and making sure you anonymise the outliers (by choosing your bottom and top ranges carefully) it's crucial.
Again, this tool is a hammer. We expect a person that understands about wood and nails to analyse their problem and use it.
I did a quick search and couldn't find it. Does anybody remember it? I'm wondering if my memory or my search skills are failing or if it has simply disappeared.