E.g. there is **__contact__** in the page, bold and underlined, but you cannot click on it to do anything.
E.g. there is **__contact__** in the page, bold and underlined, but you cannot click on it to do anything.
The results were kind of fascinating, because it appeared to confuse my system prompt telling it to summarize the conversation with the various questions asked in the post itself, which it tried to answer.
I don't think it did a great job of the task, but it's still interesting to see its "thinking" process here: https://gist.github.com/simonw/313cec720dc4690b1520e5be3c944...
Generally unimpressed with Qwen3 from my own personal set of problems.
Sounds like a polite way to say he was eaten alive
(1) > 6DT19 had been decapitated with a single cut between the second and third cervical vertebrae , delivered from behind.
(2) > Additional [to the decapitation] peri-mortem trauma was present in the form of a series of small depressions on both sides of the pelvis [..]
> Taphonomic damage alone is also unlikely due to the appearance and margins of the lesions, which are the same colour as the surrounding bone (this differs if the break is post-mortem; [56]), and the adherence of bony fragments at the injury site (which occurs when soft tissue is present) .
[1]: https://journals.plos.org/plosone/article?id=10.1371/journal...
I remember meeting someone on Discord 1-2 years ago (?) working on a GoDaddy effort to have customer-generated icons using bespoke foundation image gen models? Suppose that kind of bespoke model at that scale is ripe for replacement by gpt-image-1, given the instruction-following ability / steerability?
> When we apply CFG to Parakeet sampling, quality is significantly improved. However, on inspecting generations, there tends to be a dramatic speed-up over the duration of the sample (i.e. the rate of speaking increases significantly over time). Our intuition for this problem is as follows: Say that is our model is (at some level) predicting phonemes and the ground truth distribution for the next phoneme occuring is 25% at a given timestep. Our conditional model may predict 20%, but because our uncondtional model cannot see the text transcription, its prediction for the correct next phoneme will be much lower, say 5%. With a reasonable level of CFG, because [the logit delta] will be large for the correct next phoneme, we’ll obtain a much higher final probability, say 50%, which biases our generation towards faster speech. [emphasis mine]
Parakeet details a solution to this, though this was not adopted (yet?) by Dia:
> To address this, we introduce CFG-filter, a modification to CFG that mitigates the speed drift. The idea is to first apply the CFG calculation to obtain a new set of logits as before, but rather than use these logits to sample, we use these logits to obtain a top-k mask to apply to our original conditional logits. Intuitively, this serves to constrict the space of possible “phonemes” to text-aligned phonemes without heavily biasing the relative probabilities of these phonemes (or for example, start next word vs pause more). [emphasis mine]
The paper contains audio samples with ablations you can listen to.
[1]: https://jordandarefsky.com/blog/2024/parakeet/#classifier-fr...
lol