Without updates, many sites will likely stop working with it soon.
Kiwi had some great features, like disabling AMP mode, rearranging the Chrome Store for mobile, and customizable tab layouts, etc. These features might interest others as well.
Deleted Comment
I did recently see this browser is unsafe when trying to open Gmail in it, so any chromium based update to date alternative there would be amazing!
The last time I used it one of the common hacks was adding 50 makers to a single app launch PH also openly condoned mass email blasts and tweets to drive votes which just rewarded whoever could push the hardest on promotion
In contrast Hacker News discourages asking people for upvotes and even treats it as a negative if you do That longterm focus on signal over hype is probably why HN still feels useful today while PH lost its way
Deleted Comment
My simple explanation of how batching works: Since the bottleneck of processing LLMs is in loading the weights of the model onto the GPU to do the computing, what you can do is instead of computing each request separately, you can compute multiple at the same time, ergo batching.
Let's make a visual example, let's say you have a model with 3 sets of weights that can fit inside the GPU's cache (A, B, C) and you need to serve 2 requests (1, 2). A naive approach would be to serve them one at a time.
(Legend: LA = Load weight set A, CA1 = Compute weight set A for request 1)
LA->CA1->LB->CB1->LC->CC1->LA->CA2->LB->CB2->LC->CC2
But you could instead batch the compute parts together.
LA->CA1->CA2->LB->CB1->CB2->LC->CC1->CC2
Now if you consider that the loading is hundreds if not thousands of times slower than computing the same data, then you'll see the big different, here's a "chart" visualizing the difference of the two approaches if it was just 10 times slower. (Consider 1 letter a unit of time.)
Time spent using approach 1 (1 request at a time):
LLLLLLLLLLCLLLLLLLLLLCLLLLLLLLLLCLLLLLLLLLLCLLLLLLLLLLCLLLLLLLLLLC
Time spend using approach 2 (batching):
LLLLLLLLLLCCLLLLLLLLLLCCLLLLLLLLLLCC
The difference is even more dramatic in the real world because as I said, loading is many times slower than computing, you'd have to serve many users before you see a serious difference in speeds. I believe in the real world the restrictions is actually that serving more users requires more memory to store the activation state of the weights, so you'll end up running out of memory and you'll have to balance out how many people per GPU cluster you want to serve at the same time.
TL;DR: It's pretty expensive to get enough hardware to serve an LLM, but once you do have you can serve hundreds of users at the same time with minimal performance loss.
- Big models like GPT-4 are split across many GPUs (sharding).
- Each GPU holds some layers in VRAM.
- To process a request, weights for a layer must be loaded from VRAM into the GPU's tiny on-chip cache before doing the math.
- Loading into cache is slow, the ops are fast though.
- Without batching: load layer > compute user1 > load again > compute user2.
- With batching: load layer once > compute for all users > send to gpu 2 etc
- This makes cost per user drop massively if you have enough simultaneous users.
- But bigger batches need more GPU memory for activations, so there's a max size.
This does makes sense to me but does this sound accurate to you?
Would love to know if I'm still missing something important.
Imagine just one link in a tweet, support ticket, or email: https://discord.com/_mintlify/static/evil/exploit.svg. If you click it, JavaScript runs on the discord.com origin.
Here's what could happen:
- Your Discord session cookies and token could be stolen, leading to a complete account takeover.
- read/write your developer applications & webhooks, allowing them to add or modify bots, reset secrets, and push malicious updates to millions.
- access any Discord API endpoint as you, meaning they could join or delete servers, DM friends, or even buy Nitro with your saved payment info.
- maybe even harvest OAuth tokens from sites that use "Login with Disord."
Given the potential damage, the $4,000 bounty feels like a slap in the face.
edit: just noticed how HN just turned this into a clickable link - this makes it even scarier!