nvivo (u/nvivo) - Readit News

nvivo commented on Azure Storage: How fast are disks? grsplus.com/blog/2018/07/... · Posted by u/ed_elliott_asc

teddyuk · 7 years ago

is the performance worse than aws?

https://www.datadoghq.com/blog/aws-ebs-latency-and-iops-the-...

nvivo · 7 years ago

I know what I see in my apps. Yes, Azure is worse than AWS. AWS has its flaws, but Azure is simply very slow in comparison.

nvivo commented on XARs: An efficient system for self-contained executables code.fb.com/data-infrastr... · Posted by u/terrelln

fenesiistvan · 7 years ago

I am a windows developer and the single thing that stops me porting my apps to linux is an easy to use deploy method. Is there some good way to handle this task without to spend months learning about linux administration like shell scripts, finding the best place for configs, logs on different linux distros, daemons setup, etc. Something simple and distro independent would be fine...

nvivo · 7 years ago

I must say as a long time windows developer, .net core on linux is much simpler than on windows. I'm moving everything I have from windows to Linux and the experience, simplicity, speed, stability, etc are much better on linux.

I guess of you're working with gui that's a different story. But for .net websites and background servers, simply use docker on linux and never lool back. And it's much simpler than docker for windows too.

nvivo commented on Azure Storage: How fast are disks? grsplus.com/blog/2018/07/... · Posted by u/ed_elliott_asc

thejosh · 7 years ago

Azure disk speed for VMs is absolutely horrendous for the amount you have to pay. Our workloads are so bad because of this.

nvivo · 7 years ago

I completely agree. For some reason they're not 100% ssd and that changes everything. That is the main reason I avoid Azure. Last time I tested (about 8 months ago) I got a new windows server instance with 4gb ram and do a windows update. In my laptop takes 20 minutes, aws takes 40 minutes or so, azure takes almost an entire day! Partly this is because MS doesnt keep their images as updated as AWS, but mostly is because HDD instead of SDD as default. Not advocating anything, just my experience.

nvivo commented on Apple Engineers Its Own Downfall with the Macbook Pro Keyboard ifixit.org/blog/10229/mac... · Posted by u/andrewke

girvo · 7 years ago

I thought the RAM thing would decrease battery life, because it requires a different intel chipset — as shown by the contemporary laptops from other manufacturers with the same CPUs and chipset topping out at 16GB as well?

I could be misremembering, and it frustrates me either way, but I was certain that Intel also held some blame for this?

nvivo · 7 years ago

I find it hard to believe that any decent chipset made in the last decade at least cannot support 32gb of ram. I had a reasonably cheap samsung laptop from 2012 that already did. If apple is not doing it, it's not by the lack of choices in the market.

nvivo commented on Showdown: MySQL 8 vs. PostgreSQL 10 blog.dumper.io/showdown-m... · Posted by u/kenn

th3sly · 7 years ago

but it supports conversion to binary (+shuffle to keep order) using "uuid_to_bin"

nvivo · 7 years ago

Yes, but this is new to mysql 8, which is pretty recent. Also, I always use uuid v4 to explicitly prevent any ordering.

nvivo commented on Showdown: MySQL 8 vs. PostgreSQL 10 blog.dumper.io/showdown-m... · Posted by u/kenn

da_chicken · 7 years ago

> One advantage of using an incrementing integer is that rows will be ordered on disk based on when they were created.

Well, kind of. A lot of people think the auto incrementing integer function in many RDBMSs will always increase, or will never have gaps. It's likely but not guaranteed that n+k was created after n. If you really need to store the creation date, then you should store that in a datetime/timestamp column.

> If a query asks for 25 consecutive rows, there is a good chance they will all be on the same page. If you use UUIDs, then they could be on 25 different pages and you will have to do 25x the disk IO to handle the query.

This is true, but it also means that if you need to write 25 different rows, it will be in 25 different pages. That sounds bad because non-sequential writes are slower, but you have to remember that it could be 25 different connections trying to write! In other words, you create a hot spot with sequential inserts. If that's the end of the table, you'll have threads constantly waiting for other processes to do inserts since inserts lock the page being inserted.

So, yes, clustering on a UUID can cause problems (fragmented indexes, inefficient reads), but clustering on an autoincrement can also cause issues depending on your work load.

In reality, what you need to do (in the general case) is cluster on your business key even if it's not the primary key for your table.

nvivo · 7 years ago

> It's likely but not guaranteed that n+k was created after n.

This is true in mysql if you rollback a transaction, or use a INSERT INTO ... ON DUPLICATE KEY UPDATE.

In the first, the rollback doesn't revert the sequence, in the second the "insert part" will always increase the number, even if there is a duplicate to update.

nvivo commented on Showdown: MySQL 8 vs. PostgreSQL 10 blog.dumper.io/showdown-m... · Posted by u/kenn

stevenwoo · 7 years ago

What did you do to address the mysql connection problem?

nvivo · 7 years ago

I ended up reducing the number of clients. In my case I had a thousand servers, and I was able to change the application structure and merge them into a dozen big application servers. Now with connection pooling each server has less than 100 connections.

As a more permanent solution for scaling, I'm moving out of mysql into something more distributed.

nvivo commented on Showdown: MySQL 8 vs. PostgreSQL 10 blog.dumper.io/showdown-m... · Posted by u/kenn

da_chicken · 7 years ago

> I use char(36), as it's easier to query manually when needed, but I' looking into binary(16) for those billion row tables.

I would use whatever data type your RBDMS's UUID generator returns or the programming language your application is written in. If your RDBMS supports a UUID or GUID data type, however, I would 100% use that because you'll invariably have functions which help you deal with it.

Remember, however, that many (most?) RDBMSs store records in pages (or blocks) of a fixed size typically between 4KB or 8 KB, and they won't allow a record to span a page (usually when a record is too long for one page, non-key data will be moved to non-paged storage which is slower). In other words, if you reduce your record size by 20 bytes you might not actually see as big a change as you'd expect. You'd be storing less data per record, but you're maybe not changing the records per page. You're not increasing the efficiency of your data store at all because of how the data is physically stored. It also means that the answer might be different for each table since each table has a different row size.

Bottom line, however, is that I would favor storing UUIDs the way your particular RDBMS vendor tells you they should be stored. If your application has particular problems with storing UUIDs that way I would look at alternatives, but generally the RDBMS vendors have thought about this a little bit at least.

nvivo · 7 years ago

MySql doesn't have a UUID data type, the UUID() function returns a varchar. The way you store it is mostly preference and driver defaults. The C# driver used to handle binary(16) as Guids, then they deprecated that in favor of CHAR(36). But when dealing with a bilion rows, each byte counts and I'll favor binary(16) because it's smaller and that helps with the index sizes and memory usage.

nvivo commented on Showdown: MySQL 8 vs. PostgreSQL 10 blog.dumper.io/showdown-m... · Posted by u/kenn

mbell · 7 years ago

The connection comments are a bit dubious. MySQL will use less memory for 1000 connections but performance will still drop due to contention and context switching. In both systems you want a small number of connections to the actual database, something on the order of 1-2x cpu cores usually, and something on top pooling client connections if you need a lot of them, pgbouncer or the equivalent for MySQL.

nvivo · 7 years ago

I had some real issues with mysql handling more than 2000 connections. Once past that limit, cpu usage increases exponentially with few connections due to context switching. At 3000 connections, 80% of thr cpu was used gor context switching and it got unusable. That was a 64 core/256gb ram server.

nvivo commented on Showdown: MySQL 8 vs. PostgreSQL 10 blog.dumper.io/showdown-m... · Posted by u/kenn

patrickg_zill · 7 years ago

I am pretty sure that PG has had clustered indexes for a decade or more... ? e.g. https://www.postgresql.org/docs/9.1/static/sql-cluster.html

Or is this term referring to a different feature/method than this?

One thing not mentioned: PL/SQL vs. whatever the MySQL equivalent is.

nvivo · 7 years ago

I think it's different. From the docs, this is a one time update to the table. The table itself is not clustered, it's just ordered based on a clustered index, so selects by default should come in that order. But once an insert is done, it's again ouy of order.

Also, an order by using the cluster field most probably invokes an index scan, while a clustered table doesn't.