xomateix (u/xomateix)

xomateix commented on Six Degrees of Wikipedia sixdegreesofwikipedia.com... · Posted by u/sharkweek

xomateix · 4 years ago

This made me remember a webpage that using IMDB would calculate the degrees of separation between Nicolas Cage and the input person (actress, director...) of your choice.

I did a quick search and couldn't find it. Does anybody remember it? I'm wondering if my memory or my search skills are failing or if it has simply disappeared.

xomateix commented on Ask HN: Who is hiring? (August 2021) · Posted by u/whoishiring

xomateix · 5 years ago

Intent HQ | Scala Engineer | London, NYC, Barcelona, Lisbon & Remote| Full or part time

At Intent HQ we are working in a close relationship with our clients to help them have a better understanding of their data so they are able to provide better services to their own customers. We are ~80 people from all over the world (we speak 15 different languages!) based in London, Barcelona and NYC. As we are small, we love sharing ideas and really like to work along the principles of valuing 'individuals and interactions', 'customer collaboration', 'responding to change' and 'working software'.

Tech stack: Scala, Typelevel stack (cats-effect, htt4s, fs2...), Cassandra, Elasticsearch, PostgreSQL, Kafka, Docker, Nomad, Terraform, Consul, Vault, AWS, TypeScript, React, Redux

We have several positions open: https://intenthq.teamtailor.com/

Salary ranges depend on location.

If you want more information feel free to drop me an email: albert (at) intenthq.com

xomateix commented on Is GitHub Copilot a blessing, or a curse? fast.ai/2021/07/19/copilo... · Posted by u/jph00

icebraining · 5 years ago

Which repo marked as GPLv2 has been used on Copilot? I think the trouble is that some repos marked as MIT/BSD actually contain GPL code.

Not that this excuses GitHub/Microsoft in any way, this was an obvious outcome and they're morally and legally responsible.

xomateix · 5 years ago

According to GitHub support, they didn't exclude any repo based on the license: https://news.ycombinator.com/item?id=27769440

xomateix commented on Pronunciations for hexadecimal numbers (1968) twitter.com/lizhenry/stat... · Posted by u/henrik_w

Zenst · 7 years ago

It's amazing what naming conventions are used over time. In the 70's/early 80's "~" was (at least in programming circles in the UK) called a swan-hyphen, today it is often called a tilde. Though I'm sure it has many other names that have come and gone throughout the fashion of time.

xomateix · 7 years ago

In Spanish most people call it "tilde" as well, but you can also call it "Virgulilla" [1]. I always call it like this just because of how it sounds, love that word.

[1] https://es.wikipedia.org/wiki/Virgulilla

xomateix commented on Ask HN: How to Teach Coding? · Posted by u/moveax

xomateix · 7 years ago

I am of the opinion that you don't teach anything, it's the person that learns something.

So, I'd go for what others have already mentioned. Learn about the person, how they are, how they learn better, what are their interests.

Find something they like and enjoy, so they are motivated in learning.

Be ready and available to answer questions and adapt to their rhythm and needs.

Prepare materials and different options so they can chose their path as they go.

Facilitate them changing their mind, going back and forth, making their own mistakes.

In my case, for example, I learn by doing, and pair programing with somebody helps me a lot, but other people might prefer having a theoretical background first and will want to read a book before diving into coding.

EDIT: adding paragraphs for clarity

xomateix commented on Show HN: Anon – A Unix Command to Anonymise Data github.com/intenthq/anon... · Posted by u/xomateix

pdkl95 · 8 years ago

> anonymising ... columns until the output is useful for applications where sensitive information cannot be exposed

This tool will not provide any significant amount of anonymity.

> rows to randomly sample ... hash (using ... 32 bits) the column ... mod the result by the [constant] value

This is not random. It deterministically selects the same very predictable fraction of rows.

> UK format postcode (eg. W1W 8BE) and just keeps the outcode (eg. W1W)

> Given a date, just keep the year

Partial postal codes and dates quantized to the year are still very revealing. Combined with other data (such as a hashed name), the partial postal code may allow a lot of people to be uniquely identified.

> Hash (SHA1) the input

Hashing does not provide anonymity. Substituting a candidate key with the hash of the key is usually a 1-to-1 map that is often trivial to reverse. It isn't hard to iterate through e.g. all possible names, postal codes, license plates, or other short-ish strings to find a matching SHA1.

https://arstechnica.com/tech-policy/2014/06/poorly-anonymize...

The salt

might* provide some resistance to per-computed tables, but a GeForce GTX 1080 Ti running hashcat can search for matching SHA1 at over 11 GH/s (giga-hashes per second). That means that a single 1080 Ti running for ~3-4 hours would not only discover not only that SHA1("hasselhof") == ffe3294fad149c2dd3579cb864a1aebb2201f38d; it would exhaustively search all 10 character or smaller lowercase strings.

> range

This is the only feature that could provide anonymity, if it is used correctly to group large numbers of individuals into the same bucket. This is probably more difficult that it first appears.

xomateix · 8 years ago

Hey, one of the co-maintainers here. Thanks for your comments.

>> rows to randomly sample ... hash (using ... 32 bits) the column ... mod the result by the [constant] value

> This is not random. It deterministically selects the same very predictable fraction of rows.

Yep, you are right. We didn't intend the sampling function to be part of the anonymisation but just something we tend to use and we thought it would be useful to have it.

Its objective is to pick a portion of the input data. No more.

>> UK format postcode (eg. W1W 8BE) and just keeps the outcode (eg. W1W)

>> Given a date, just keep the year

> Partial postal codes and dates quantized to the year are still very revealing. Combined with other data (such as a hashed name), the partial postal code may allow a lot of people to be uniquely identified.

You are absolutely right. Depending on the use case and your data, having the outcode, the city or the year might be very revealing. In some other cases even having decades or centuries might be revealing.

We don't pretend that each function provided applies to all use cases. But in certain use cases partial postcodes or years can be good enough.

>> Hash (SHA1) the input

> Hashing does not provide anonymity.

We are very aware of that. That's why we offer the option to add a salt (that the user of the tool can make as long as possible and throw away after the anonymisation process).

>> range

> This is the only feature that could provide anonymity, if it is used correctly to group large numbers of individuals into the same bucket. This is probably more difficult that it first appears.

We usually work with sets of data that are tens of millions of users. Choosing the right ranges and, specially, analysing the data and making sure you anonymise the outliers (by choosing your bottom and top ranges carefully) it's crucial.

Again, this tool is a hammer. We expect a person that understands about wood and nails to analyse their problem and use it.