mireklzicar (u/mireklzicar)

mireklzicar commented on Would you pay for AI audiobooks that sound human? book2speech.com/... · Posted by u/orencoda

JohnFen · a month ago

I would very much avoid any audiobook (or other media) that is read by automation designed to fool the listener into thinking it's human.

mireklzicar · a month ago

Well, if you don't mean deepfakes which are obviously wrong, then that's kind of a point of all the existing AI tech, isn't it? AI image generators, AI written text as in GPT, AI written code, Text to Speech technology... The goal is to get as closely to human level performance as possible and maybe even exceed it.

mireklzicar commented on Show HN: I wrote a GPU-less billion-vector DB for molecule search (live demo) cheese-new.deepmedchem.co... · Posted by u/mireklzicar

jasonjmcghee · 2 months ago

Nice project! A regular on HN and creator of usearch built an embedding search for the same dataset and did a write up which is a great read.

https://ashvardanian.com/posts/usearch-molecules/

mireklzicar · 2 months ago

Thanks — I read Ash’s post (great blog!) and even spun up USEARCH when I first explored this space.

Main differences:

* *Cost-efficiency:* USEARCH / FAISS / HNSW keep most of the index in RAM; at the billion scale that often means hundreds of GB. In CHEESE both build and search stream from disk. For the 5.5 B-compound Enamine set the footprint is ~1.7 TB NVMe plus ~4 GB RAM (only the centroids), so it can run on a laptop and still scale to tens of billions of vectors. This is also huge difference over commercial vector DB providers (pinecone, milvus...) who would bill you many thousands USD per month for it, because of the RAM heavy instances.

* *Vector type:* USEARCH demo uses binary fingerprints with Tanimoto distance. I use 256-D float embeddings trained to approximate 3-D shape and electrostatic overlap, searched with Euclidean distance.

* *Latency vs. accuracy:* BigANN-style work optimises for QPS and milisecond latency. Chemists usually submit queries one-by-one, so they don’t mind 1–6 s if the top hits are chemically meaningful. I pull entire clusters from disk and scan them exactly to keep recall high.

So the trade-off is a few seconds slower, but far cheaper hardware and results optimized for accuracy.

mireklzicar commented on Show HN: Turn a paper's DOI into its full reference list (BibTeX/RIS, etc.) references.mireklzicar.co... · Posted by u/mireklzicar

ks2048 · 2 months ago

One suggestion: show the full reference of the DOI entered.

I wonder how OpenCitations populates their data? One example I tried showed 9 references where the paper had 30+.

mireklzicar · 2 months ago

Well, it has some caveats: a) the papers need to be in crossref, which is ~ 70% of DOIs b) it works bad with preprints for instance. The advantage is its fast and doesn't need to download/extract anything from the pdf. But for 100% reliability it would be probably necessary.

mireklzicar commented on Show HN: Turn a paper's DOI into its full reference list (BibTeX/RIS, etc.) references.mireklzicar.co... · Posted by u/mireklzicar

oersted · 2 months ago

Is an open-source library being used for this? Or can you describe the methods you use? I worked on this and related problems around extracting features from paper PDFs, we could all learn from how you did it.

Generally, an About page is always appreciated for such web tools with minimal UX, particularly when it's rather automagical.

mireklzicar · 2 months ago

Its actually open-source. Here is the repo: https://github.com/mireklzicar/doi-reference-extractor

APIs Used OpenCitations API (v2)

Endpoint: https://opencitations.net/index/api/v2/references/ Purpose: Retrieves a list of all references from a paper by its DOI Data format: JSON containing cited DOIs and metadata DOI Content Negotiation

Endpoint: https://doi.org/{DOI} Purpose: Fetches metadata and formatted citations for DOIs Formats: BibTeX, RIS, CSL JSON, RDF XML, etc. Implements CSL (Citation Style Language) for text-based citations Local Citation Style Files

Purpose: Provides access to thousands of citation styles Storage: Pre-generated JSON files with style information

mireklzicar commented on Show HN: Turn a paper's DOI into its full reference list (BibTeX/RIS, etc.) references.mireklzicar.co... · Posted by u/mireklzicar

avoutos · 2 months ago

This tool might be useful for quick one-off referencing, but I feel that most will probably be better off using a proper citation manager like the open-source Zotero.

mireklzicar · 2 months ago

Keep Zotero/Mendeley for collection management; use this simple tool when you just need the formatted references list in five seconds.

Where it helps

- Deep-dive reading – fetch bulk RIS file and dump a seminal paper’s entire bibliography into Zotero/Mendeley and follow the threads.

- Bulk citing – grab BibTeX's for a cluster of related papers without hunting them down one-by-one.

- LLM grounding – feed language models a clean reference list so they stop hallucinating citations.