zackelan (u/zackelan)

zackelan commented on Command-line Tools can be 235x Faster than a Hadoop Cluster (2014) adamdrake.com/command-lin... · Posted by u/hd4

mmt · 7 years ago

It's frustrated me for the better part of a decade that the misconception persists that "big data" begins after 2U. It's as if we're all still living during the dot-com boom and the only way to scale is buying more "pizza boxes".

Single-server setups larger than 2U but (usually) smaller than 1 rack can give tremendous bang for the buck, no matter if your "bang" is peak throughput or total storage. (And, no, I don't mean spending inordinate amounts on brand-name "SAN" gear).

There's even another category of servers, arguably non-commodity, since one can pay a 2x price premium (but only for the server itself, not the storage), that can quadruple the CPU and RAM capacity, if not I/O throughput of the cheaper version.

I think the ignorance of what hardware capabilities are actually out there ended up driving well-intentioned (usually software) engineers to choose distributed systems solutions, with all their ensuing complexity.

Today, part of the driver is how few underlying hardware choices one has from "cloud" providers and how anemic the I/O performance is.

It's sad, really, since SSDs have so greatly reduced the penalty for data not fitting in RAM (while still being local). The penalty for being at the end of an ethernet, however, can be far greater than that of a spinning disk.

zackelan · 7 years ago

That's a good point, I suppose it'd be better to frame it as what you can run on a $1k workstation vs. a $10k rackmount server, or something along those lines.

As a software engineer who builds their own desktops (and has for the last 10 years) but mostly works with AWS instances at $dayjob, are there any resources you'd recommend for learning about what's available in the land of that higher-end rackmount equipment? Short of going full homelab, tripling my power bill, and heating my apartment up to 30C, I mean...

zackelan commented on Command-line Tools can be 235x Faster than a Hadoop Cluster (2014) adamdrake.com/command-lin... · Posted by u/hd4

mseebach · 7 years ago

I heard a variation on this: it's not big data until it can't fit in RAM in a single rack.

zackelan · 7 years ago

The version I've heard is that small data fits on an average developer workstation, medium data fits on a commodity 2U server, and "big data" needs a bigger footprint than that single commodity server offers.

I like that better than bringing racks into it, because once you have multiple machines in a rack you've got distributed systems problems, and there's a significant overlap between "big data" and the problems that a distributed system introduces.

zackelan commented on Python 3.7: Introducing Data Classes blog.jetbrains.com/pychar... · Posted by u/ingve

reaperhulk · 7 years ago

As noted in the PEP data classes is a less fully-featured stdlib implementation of what attrs already provides. Unless you’re constrained to the stdlib (as those who write CPython itself of course are) you should consider taking a look at attrs first.

http://www.attrs.org/en/stable/

zackelan · 7 years ago

attrs also has a feature that dataclasses don't currently [0]: an easy way to use __slots__ [1].

It cuts down on the per-instance memory overhead, for cases where you're creating a ton of these objects. It can be useful even when not memory-constrained, because it will throw AttributeError, rather than succeeding silently, if you make a typo when assigning to an object attribute.

0: https://www.python.org/dev/peps/pep-0557/#support-for-automa...

1: http://www.attrs.org/en/stable/examples.html#slots

zackelan commented on Curated List of Privacy Respecting Services and Software github.com/nikitavoloboev... · Posted by u/nikivi

zackelan · 7 years ago

I've been very happy with Wallabag as a Pocket / Instapaper replacement. Self-hostable, with an option to pay 9 EUR/year for a hosted version.

zackelan commented on Curated List of Privacy Respecting Services and Software github.com/nikitavoloboev... · Posted by u/nikivi

mobitar · 7 years ago

Standard Notes for private, encrypted notes :) (Alternative to Evernote I work on) https://standardnotes.org

zackelan · 7 years ago

Just discovered that earlier this week, very happy with it so far. Thanks for your work!

zackelan commented on Parallel tasks in Python: concurrent.futures vinta.ws/code/parallel-ta... · Posted by u/gibuloto

miracle2k · 7 years ago

I found this article extremely persuasive, and it matches my own experiences with asyncio: The performance gain might be there if most of what you do is waiting for a network response, but even a small amount of data processing will make your program CPU bound pretty quickly.

http://techspot.zzzeek.org/2015/02/15/asynchronous-python-an...

zackelan · 7 years ago

Note that it's not either/or - you can dispatch work from an event loop to a thread pool (or a process pool) with loop.run_in_executor [0], while loop.call_soon_threadsafe [1] can be used by worker threads to add callbacks to the event loop.

This means that the "frontend" of a service can be asyncio, allowing it to support features like WebSockets that are non-trivial to support without aiohttp or a similiar asyncio-native HTTP server [2], while the "backend" of the service can be multi-threaded or multi-process for CPU-bound work.

0: https://docs.python.org/3/library/asyncio-eventloop.html#exe...

1: https://docs.python.org/3/library/asyncio-eventloop.html#asy...

2: Flask-SocketIO, for example, requires that you use eventlet or gevent, which are the "legacy" ways of doing asynchronous IO: https://flask-socketio.readthedocs.io/en/latest/

zackelan commented on Kubernetes 1.10 released blog.kubernetes.io/2018/0... · Posted by u/el_duderino

wolfgang42 · 7 years ago

Is there an easy way to get a single-node production-ready Kubernetes instance? I'd like to start using the Auto DevOps features that GitLab is adding, but all the tutorials I can find either have you installing minikube on your laptop or setting up a high-availability cluster with at least 3 hosts. Right now I'm using CoreOS and managing Docker containers with systemd and shell scripts, which works all right but is tedious and kind of hard to keep track of. I don't have anything that needs to autoscale or fail over or do high availability, I just want something that integrates nicely and makes it easy to deploy containers.

EDIT: I should have clarified, I want to self-host this on our internal VMWare cluster, rather than run it on GKE.

zackelan · 7 years ago

If you want a middle ground between hand-written shell scripts and full-blown Kubernetes, we use Hashicorp's Nomad[0] on top of CoreOS at $dayjob and are quite happy with it.

Similar use case - self-hosted VMs, for low-traffic, internal tools, and no need for autoscaling.

I can't speak to how well it integrates with Gitlab's Auto DevOps, but Nomad integrates very well with Terraform[1] and I'd be surprised if there wasn't a way to plug Terraform into Gitlab's process.

0: https://www.nomadproject.io/

1: https://www.terraform.io/

zackelan commented on Amazon Aurora Postgres: First Thoughts linkedin.com/pulse/amazon... · Posted by u/brandur

rpedela · 8 years ago

Is Postgres RDS any different? Isn't it also run on EBS?

zackelan · 8 years ago

The key difference between "classic" RDS and Aurora is that classic RDS really only automated the control plane. That is, RDS spins up an EC2 instance (or two, for multi-AZ) on your behalf, attaches an EBS volume of the appropriate specs, installs Postgres, sets up security and backups and replication etc.

Under classic RDS, when your application makes a SQL connection (the data plane) it's talking to a more or less stock Postgres instance, the same as you would have if you ran it locally.

Aurora, on the other hand, is involved in both the control plane and data plane. Your SQL connection is to a Postgres instance that's been forked/modified to work within Aurora.

zackelan commented on California lawmakers have tried for 50 years to fix the state’s housing crisis latimes.com/projects/la-p... · Posted by u/jseliger

sliverstorm · 8 years ago

Redeveloping a low density property for higher density is fine, except part of what makes that new property attractive is the neighborhood it was built in. Which in our hypothetical, is low density and maybe nice.

Then in the process of redeveloping, as more and more properties are redeveloped, the old character is lost to everybody- the people who lived there as well as the new people who moved there for that character!

It's not a new conundrum- what do you do about people moving to a community for it's desirable character, but killing that character in the process? It hurts everyone involved.

HN frequently likes to take the position that the parcel you bought is yours, but if your neighbor wants to bootstrap a red light district you just have to deal. But, all over the country we have HOA's and zoning and so forth. Turns out, people want to come together and live in a community. Nobody particularly likes living in a free-for-all, so mutual agreements were set up to ensure the neighborhood you bought into doesn't turn into something totally different overnight.

zackelan · 8 years ago

> what do you do about people moving to a community for its desirable character but killing that character in the process?

Here's what I think is the central (and flawed) assumption in this line of reasoning - people move to an area because of its "character". And that "character" is an intangible, immeasurable quality, but it is somehow diminished if more people move to the area.

I grew up in Seattle. Both of my grandparents, when I was a kid, lived in Seattle's Fremont neighborhood. I live in Fremont today. From one perspective, the Fremont of my childhood is completely changed. On the other hand, it's still Fremont, with the Center of the Universe sign and the statue of Lenin and many other things I remember from childhood. Does it have the same "character"? Does it have a newer, different, but just as good, "character"?

Those are impossible questions and it boils down to a Ship of Theseus style argument. Either way, I can't bring myself to assert that the housing supply of Fremont should be artificially constrained by zoning policies, in order to preserve my ideal of what Fremont "should be" or "used to be".