Readit News logoReadit News
rjakob commented on Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis   github.com/robertjakob/ri... · Posted by u/rjakob
anticensor · 3 months ago
That eight month delay is caused by backlog, a human reviewing an individual paper takes about a day.
rjakob · 3 months ago
If you know the trick to getting reviewed in a day, do tell. Asking for an entire field.
rjakob commented on Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis   github.com/robertjakob/ri... · Posted by u/rjakob
extasia · 3 months ago
the professor vs PhD mode made me chuckle in your UI.

looking forwards to seeing your agent’s critique of my paper!

rjakob · 3 months ago
inspired by friends at Browser-Use
rjakob commented on Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis   github.com/robertjakob/ri... · Posted by u/rjakob
anticensor · 3 months ago
Is this aimed for pre-submission or post-submission peer reviews, or both?
rjakob · 3 months ago
whenever you would like to have comprehensive feedback on your manuscript (more likely during pre-submission or after publishing a preprint).
rjakob commented on Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis   github.com/robertjakob/ri... · Posted by u/rjakob
NoahZuniga · 3 months ago
It makes sense to have the AI do the boring stuff, but don't frame it as a peer reviewer, because that's not what it is.
rjakob · 3 months ago
noted.
rjakob commented on Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis   github.com/robertjakob/ri... · Posted by u/rjakob
SubiculumCode · 3 months ago
My view, and how I conduct my peer reviews, is that I do not care to make decisions of whether a question is important/interesting or not. I feel like my job is to judge the paper on rigor, and whether it fails (purposely or from ignorance) to address/acknowledge the relevant literature.
rjakob · 3 months ago
We also provide feedback on rigor across 7 different categories: https://github.com/robertjakob/rigorous/tree/main/Agent1_Pee...
rjakob commented on Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis   github.com/robertjakob/ri... · Posted by u/rjakob
davidcbc · 3 months ago
This seems to just be a wrapper around a bunch of LLM prompts. What value is being added in the (eventual) pay version?

As a free github project it seems.. I don't know, it's not peer review and shouldn't be advertised as such, but as a basic review I guess it's fine, but why would someone pay you for a handful of LLM prompts?

If your business can be completely replicated by leaked system prompts I think you're going to have issues

rjakob · 3 months ago
System prompts / review criteria cannot be "leaked" because they are open-source (full transparency). Focusing heavily on monetization at this stage seems shortsighted...this tool is a small (but longterm important) step of a larger plan.
rjakob commented on Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis   github.com/robertjakob/ri... · Posted by u/rjakob
atrettel · 3 months ago
I agree. I've worked at a national lab before and I immediately thought this service is a massive security risk. It will definitely be hard for some scientists to use these kind of cloud services, especially if their research truly is cutting edge and sensitive. I think many people will just ignore things like this because they want to keep their jobs, etc.
rjakob · 3 months ago
As mentioned above, there is an open-source version for those who want full control. The free cloud version is mainly for convenience and faster iteration. We don’t store manuscript files longer than necessary to generate feedback (https://www.rigorous.company/privacy), and we have no intention of using manuscripts for anything beyond testing the AI reviewer.
rjakob commented on Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis   github.com/robertjakob/ri... · Posted by u/rjakob
yusina · 3 months ago
The rule set is simple: "Don't be biased." What does that mean? And that is the problem. It's hard (read: impossible) to define in technical, formal terms. That's because bias is at the root a social problem, not a technical one. Therefore you won't be able to solve it with technology. Just like poverty, world peace, racism.

The best you can hope for is to provide technical means to point out indicators of bias. But anything beyond that could, at worst, do more harm than good. ("The tool said this result is unbiased now! Keep your skepticism to yourself and let me publish!")

rjakob · 3 months ago
Then let's try to be the least biased and fully transparent (which should also help with bias)
rjakob commented on Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis   github.com/robertjakob/ri... · Posted by u/rjakob
howon92 · 3 months ago
This is a great idea. Can you share more about what "24 specialized agents" mean in this context? I assume each agent is not simply an LLM model with a specific prompt (e.g. "You're the world's best biologist. Review this biology research paper.") but is a lot more sophisticated. I am trying to learn how sophisticated it is
rjakob · 3 months ago
rjakob commented on Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis   github.com/robertjakob/ri... · Posted by u/rjakob
mattmanser · 3 months ago
I had a quick look at the repo as I wondered what you meant by multiple specialized agents.

Funsamentally each of those 24 agents seem to be just:

"load from pdf > put text into this prompt > Call OpenAI API"

So is it actually just posting 24 different prompts to a generalist AI?

I'm also wondering about the prompts, one I read said "find 3-4 problems per section....find 10-15 problems per paper". What happens when you put up a good paper, does this force it to find meaningless, nit-picky, problems? Have you tried papers which are acknowledged to be well written on it?

From a programming perspective the code has got a lot of room from improvements.

The big one is if you'd used the same interface for each "agent" you could have had them all self register and call themselves in a loop rather than having to do what you've done in this file:

https://github.com/robertjakob/rigorous/blob/main/Agent1_Pee...

TBH, that's a bit of a WTF file. The `def _determine_research_type` method looks like a placeholder you've forgotten about too, as it use a pretty wonky way to determine the paper type.

Also, you really didn't need specialized classes for each prompt, you could have just had the prompts as text files a single class loaded as templates that you just replace text into. That will mean you're going to have a lot of work whenever you need to update the way your prompting works, having to change 24 files each time, probably cut/pasting which is error prone.

I've done it before where you have the templates in a folder, and the program just dynamically loads them. So you can add more really easily. Next stage is to add pre-processor directives to your loader that allows you to put some config at the top of each text file.

I'm also not looking that hard at the code, but it seems you dump the entire paper into each prompt, rather than just the section it needs to review, which seems like an easy money saver if you asked an AI to chop up the paper, then just inject the section needed to reduce your costs for tokens. Although you then run the risk of it chopping it up badly.

Finally, and this is a real nitpick but it's twitch inducing when reading the prompts, comments in javascript are two forward slashes not a hash.

rjakob · 3 months ago
Best feedback so far!

You're right: In the current version each "agent" essentially loads the whole paper, applies a specialized prompt, and calls the OpenAI API. The specialization lies in how each prompt targets a specific dimension of peer review (e.g., methodological soundness, novelty, citation quality). While it’s not specialization via architecture yet (i.e., different models), it’s prompt-driven specialization, essentially simulating a review committee, where each member is focused on a distinct concern. We’re currently using a long-context, cost-efficient model (GPT-4.1-nano style) for these specialized agents to keep it viable for now. Think of it as an army of reviewers flagging areas for potential improvement.

To synthesize and refine feedback, we also run Quality Control agents (acting like an associate editor), which reviews all prior outputs from the individual agents to reduce redundancy and surface the most constructive insights (and filter out less relevant feedback).

On your point about nitpicking: we’ve tested the system on several well-regarded, peer-reviewed papers. While the output is generally reasonable and we did not discover "made up" issues yet, there are occasional instances where feedback is misaligned. We're convinced, however, we can almost fully reduce such noise in future iterations (Community Feedback is super important to achieve this).

On the code side: 100% agree. This is very much an MVP focused on testing potential value to researchers, and the repeated agent classes were helpful for fast iteration. However, your suggestion of switching to template-based prompt loading and dynamic agent registration is great and would improve maintainability and scalability. We'll 100% consider it in the next version.

The _determine_research_type method is indeed a stub. Good catch. Also, lol @ the JS comment hashes, touché.

If you're open to contributing or reviewing, we’d love to collaborate!

u/rjakob

KarmaCake day62July 17, 2021View Original