dragonsh (u/dragonsh)

dragonsh commented on Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac simonwillison.net/2024/No... · Posted by u/pabs3

Vetch · a year ago

Claude Sonnet 3.5s are bars too high to clear. No other model comes close, with the occasional exception of o1-preview. But o1-preview is always a gamble, your rolls are limited and it will either be the best answer possible from an LLM or it returns after a wild goose chase, having talked itself into a tangled mess of confusion.

I'd personally rank the Qwen2.5 32B model only a little behind GPT4o at worst, and preferable to gemini 1.5 pro 002 (at code only, Gemini is a model that's surprisingly bad at code considering its top class STEM reasoning).

This makes Qwen2.5-coder-32B astounding all considered. It's really quite capable and is finally an accessible model that's useful for real work. I tested it on some linear algebra, discussed pros and cons of a belief propagation based approach to SAT solving, had it implement a fast simple approximate nearest neighbor based on the near orthogonality of random vectors in high dimensions (in OCaml, not perfect with but close enough to useful/easily correctable), simulate execution of a very simple recursive program (also Ocaml) and write a basic post processing shader for Unity. It did really well on each of those tasks.

dragonsh · a year ago

Not really tried the Claude 3.5, later tried o1-preview on github models and recently Qwen2.5 32B for a prompt to generate a litestar[0] app to manage a wysiwyg content using grapesjs[1] and use pelican[2] to generate static site. It generated very bad code and invented many libraries in import which didn't exist. Cluade was one of the worst code generator, later tried sieve of atkin to generate primes to N and then use miller-rabin test to test each generated prime both using all the cpu core available. Claude completely failed and could never get a correct code without some or the other errors especially using multiprocess, o1-preview got it right in first attempt, Qwen 2.5 32B got it right in 3'rd error fix. In general for some very simple code Claude is correct but when using something new it completely fails, o1-preview performs much better. Give a try to generate some manim community edition visualization using Claude, it generates something not working correct or with errors, o1-preview does much better job.

In most of my test o1-preview performed way better than Claude and Qwen was not that bad either.

[0] https://github.com/litestar-org/litestar

[1] https://grapesjs.com/

[3] https://getpelican.com/

dragonsh commented on Rust-Written Linux Scheduler Showing Promising Results for Gaming Performance phoronix.com/news/Rust-Li... · Posted by u/electricant

antirez · 2 years ago

I believe that some hype / over-enthusiasm is acceptable, but I'm a bit more worried when a given community uses it as some kind of organized propaganda, arriving to excesses like saying that writing code in non-safe languages should be considered immoral, and also never admitting that Rust is a compromise in the design space like anything else. So ok for the hype, but sometimes I see some excesses. Anyway this is just my very limited opinion, I don't claim to be right, I just believe my feeling represents some part of people here.

dragonsh · 2 years ago

Like you said its fine to promote language on its merit, but using false propaganda to promote it borders fraud and Rust community needs to act together to stop such practices.

Overhype within Rust community is a sign of fear among the community that some simple language (may be zig, nim or something new) might come and pull the rug under their feet. Given it compromise on simplicity in favour of quashing memory safety bugs.

One reason rewrite in Rust is popular because its easier to copy something already done vs doing something original (one recent example came to mind is rewriting gnu coreutils in Rust).

dragonsh commented on OpenAI's board has fired Sam Altman openai.com/blog/openai-an... · Posted by u/davidbarker

dragonsh · 2 years ago

GPT-4 prompt: “Write an essay in 1000 words why Sam Altman should be fired by OpenAI board”

Response:

Sam Altman, the CEO of OpenAI, has been a controversial figure in the AI industry. His leadership style, lack of transparency, and decision-making processes have raised significant concerns among OpenAI's employees and the public. This essay will delve into these issues, arguing that Altman's actions warrant his removal from his position.

Firstly, Altman's lack of transparency is a major concern. He has been known to make decisions without adequately consulting with his team or the public. This has led to a lack of trust and dissatisfaction among OpenAI's employees. For instance, when Altman announced that OpenAI would be focusing on a single project, he did not provide sufficient reasoning or context. This lack of communication has left employees feeling disenfranchised and uninformed.

Secondly, Altman's decision-making processes are often questionable. His decisions have not always been in the best interest of OpenAI or its employees. For example, when OpenAI decided to pivot from developing AI systems to developing AI safety research, many employees felt that this was a strategic mistake. Altman's decision to focus on this area without considering the potential negative impacts on the company's reputation and financial stability was a clear example of poor decision-making.

Thirdly, Altman's leadership style has been described as autocratic. He has been known to make decisions without considering the input of his team. This has led to a lack of buy-in from employees and has negatively impacted morale. For instance, when Altman decided to shift OpenAI's focus to AI safety research, many employees felt that their ideas and contributions were being overlooked.

Finally, Altman's actions have also raised concerns about his commitment to AI safety. His decision to focus on AI safety research, rather than on developing AI systems, has raised questions about his commitment to the field. This decision has also raised concerns about the potential misuse of AI technology and has led to a loss of trust among the public.

In conclusion, Sam Altman's lack of transparency, questionable decision-making, autocratic leadership style, and concerns about his commitment to AI safety are all reasons why he should be removed from his position at OpenAI. It is clear that his actions have led to a lack of trust and dissatisfaction among OpenAI's employees and the public. It is crucial that OpenAI takes these concerns seriously and makes changes to ensure the success and safety of its AI technology.

dragonsh commented on OpenChat: Advancing open-source language models with imperfect data github.com/imoneoi/opench... · Posted by u/BafS

hmottestad · 2 years ago

I would say that they are still a ways off.

Question: Susan has 7 brothers, each of which has one sister. How many sisters does Mary have?

Response: If Susan has 7 brothers, and each brother has one sister, then Susan has 7 sisters. Therefore, Mary, who is one of Susan's sisters, has 7 sisters. The answer is: 7.

I tried it in ChatGPT and the answer was perfect.

dragonsh · 2 years ago

May be give anthropic a try the answer is more correct than ChatGPT, indeed in every subsequent attempt the quality of ChatGPT both GPT-3.5 and GPT-4 deteriorated.

Here are the answers:

    ChatGPT:
      First response: 
        Mary is the sister of Susan and Susan has 7 brothers. Each of these 7 brothers has one sister, and that sister is Mary. So, Mary has 7 brothers, and she is also the sister of Susan. Therefore, Mary has a total of 8 sisters, including Susan.

     Second Response:
       Mary is the sister of Susan, so she has one sister. There is no need to consider the 7 brothers in this context as the question is asking specifically about how many sisters Mary has.

    GPT-4 Response:
       From the information provided, we can infer that Mary has one sister, Susan. The question specifically asks about how many sisters Mary has, and it doesn't mention any other siblings. Therefore, based on the given information, Mary has only one sister, Susan.

    Anthropic:

     Okay, let's break this down step-by-step:

      Susan has 7 brothers
      Each of Susan's 7 brothers has 1 sister
      So there are 7 sisters total (1 sister for each of the 7 brothers)
      The question asks how many sisters Mary has
      Mary is not mentioned anywhere else in the problem
      So there is no information provided to determine how many sisters Mary has

     Therefore, the question does not provide enough information to determine how many sisters Mary has. The number of sisters Susan's brothers have does not tell us anything about how many sisters someone unrelated named Mary has.