Cool, looks like text highlighting is a new addition in 2.10. There aren't any examples in the demo site of this, but can it capture the highlighted text snippets and show them in the link details page? That would help me recall quickly why I saved the link, without opening the original link and re-reading the page. I haven't really seen this in other tools (or maybe I just haven't looked hard enough), except Memex.
Great product! Does it handle special metadata like https://mymind.com/ does, eg. showing prices directly in the UI if the saved link is a product in a shop? If not, things like that would be a great addition!
Site note: When a website advertising a product does a bad job at optimising the loading of the page, that's usually a red flag for me; yes that website has noticeable jitter when scrolling up and down even though it _only_ load around ~70Mb worth of assets initially.
I'd be interested to hear your thoughts on having a PWA vs regular mobile apps since it looks like you started with a PWA, but are moving to regular apps. Is that just a demand / eyeballs thing or were there technical reasons?
So there are different ways it archives a webpage.
It currently stores the full webpages as a single html file, a screenshot, a pdf, a read-it-later view.
Aside from that, you can also send the webpages to the Wayback Machine to take a snapshot.
To archive pages behind a login or paywall, you can use the browser extension, which captures an image of the webpage in the browser and sends it to the server.
How difficult would it be to import an existing list of links/tags? Also, if I were using a hosted version, would I be able to eg insert/retrieve files via an API call?
I ask because currently I use Readwise but have a local script that syncs the reader files to a local DB, which then feeds into some custom agent flows I have going on on the side.
- Does the web front end support themes? It’s a trivial thing but based on the screenshots, various things about the default theme bug me and it would be nice to be able to change those without a user style extension.
- Does it have an API that would allow development of a native desktop front end?
a question arose for me though: if the AI tagging is self hostable as well, how taxing is it for the hardware, what would the minimum viable hardware be?
I took a look at this... and you use the Ollama API behind the scenes?? Why not use an OpenAI compatible endpoint like the rest of the industry?
Locking it to Ollama is stupid. Ollama is just a wrapper for llama.cpp anyways. Literally everyone else running LLMs locally- llama.cpp, vllm (which is what the inference providers use, also I know Deepseek API servers use this behind the scenes), LM Studio (for the causal people), etc all use an OpenAI compatible api endpoint. Not to mention OpenAI, Google, Anthropic, Deepseek, Openrouter, etc all mainly use (or at least fully supports, in the case of Google) an OpenAI compatible endpoint.
If you don’t like this free and open source software that was shared it’s luckily possible to change it yourself…or if it’s not supporting your favorite option you can also just ignore it. No need to call someone’s work or choices stupid.
I've been using Karakeep (formerly known as Hoarder) and it's been a great experience so far. One thing they're working on now is a Safari browser extension. I noticed Linkwarden lacks a Safari browser extension - is one on the roadmap?
Lately I've been using MacOS and I've noticed Chromium-based browsers use more resources than the native Safari. This is especially true with Microsoft Edge, which sometimes consumes tens of gigabytes of RAM (possibly a memory leak?). In an attempt to preserve battery life and SSD longevity, Safari is now my go-to browser on MacOS.
I'm also using Karakeep. It also has LLM-powered tagging, which, in my experience, works excellently. It's easy to self-host, fast on a relatively underpowered NAS, and I love the UX. Highly recommended.
Linkwarden looks nice, too, but when picking an option, I wanted one with a native Android app.
Is there any software that can provide verified, trusted archives of websites?
For example, we can go to the Wayback Machine at archive.org to not only see what a website looked like in the past, but prove it to someone (because we implicitly trust The Internet Archive). But the Wayback Machine has deleted sites when a site later changes its robots.txt to exclude it, meaning that old site REALLY disappears from the web forever.
The difficulty for a trusted archive solution is in proving that the archived pages weren't altered, and that the timestamp of the capture was not altered.
It seems like blockchain would be a big help, and would prevent back-dating future snapshots, but there seem to be a lot of missing pieces still.
Webrecorder's WACZ signing spec (https://specs.webrecorder.net/wacz-auth/latest) does some of this — authenticating the identity of who archived it and at what time — but the rest of what you're asking for (legitimacy of the content itself) is an unsolved problem as web content isn't all signed by its issuing server.
In some of the case studies Starling (https://www.starlinglab.org/) has published, they've published timestamps of authenticated WACZs to blockchains to prove that they were around at a specific time... More _layers_ of data integrity but not 100% trustless.
There's been attempts to standardize a way for a HTTPS server to say "Yes, this response really did come from me", but nothing has been really adopted.
Without the server participating, best you can do is a LetsEncrypt-style "we made this request from many places and got the same response" statement by a trusted party.
Inspiration: roughtime can be used to piggyback a "proof of known hash at time" mechanism, without blockchain waste. That lets you say "I've had this file since this time".
Take a look at singleFile - a project that lets you save the entire webpage. It has an integration for saving the hash if the page on a Blockchain. You can choose to set it up between parties who're interested in the provenance of the authenticity.
As a paid product, has anyone used Raindrop as well and have opinions/comparisons? And on the self hosted side, vs Hoarder?
I’ve been considering switching from Raindrop to a self hosted option, but while I like self hosting I’m also leaning towards just paying someone to handle this particular service for me.
I used to use raindrop however found it a bit bloated with features I never use, I've switched to selfhosting linkding: https://linkding.link and enjoy the much more minimal experience
I've never heard of raindrop and it looks cool but I see the .ru in one of their screenshots -- are they based in Russia? Any concerns with doing business with a Russian company, in the context of sanctions etc.?
Rustem Mussabekov on 24 Oct 2023 wrote:
"I'm founder of Raindrop.io. I'd like to clarify information about the origin of myself and the project. While I did live in Russia for a long time and initially started Raindrop there, I relocated to my motherland, Kazakhstan, shortly after the war began. I also moved all financial and business matters there.
I am no longer associated with Russia in any way. It would be great if this information could be added to the article."
I love these sorts of apps, but I still am not really sure why I need the webpages. At any time I do research for a topic I find more things than I can read in that session, so what are the old links for?
I would love to hear how people use this product once they have stored the links!
I've used https://historio.us since 2011 and still pay for it to keep access to all the pages I've archived over the years. The price has been kept low enough that I can't bring myself to cancel it even though I've been using self-hosted https://archivebox.io/ for the last few years.
I always include an archived link whenever I reference something in documentation. That's my main use at the moment.
However, I also feel like I've gotten a lot of really good value when trying to learn a new development topic. Whenever I find something that looks like it might be useful, I archive it and, because everything is searchable, I end up with a searchable index of really high quality content once I actually know what I'm doing.
I find it hard to rediscover content via web search these days and there's so much churn that having a personal archive of useful content is going to increase in value, at least in my opinion.
How much space is the self-hosted solution taking? I've been meaning to try and find a better way to look through my bookmarks since no browser is capable of doing that properly it seems.
I haven't tried Linkwarden (still doing the `wget --mirror` thing myself), but one of the reasons I like archiving pages is so I can have a collection of pages that work in older browsers on vintage computers. I pop open View Source on any site I find that looks even vaguely old, and if I see a DOCTYPE up to and including XHTML 1.1 I archive that shit immediately even if it's not a site about any of my biggest interests lol
Some key features of the app (at the moment):
- Text highlighting
- Full page archival
- Full content search
- Optional local AI tagging
- Sync with browser (using Floccus)
- Collaborative
Also, for anyone wondering, all features from the cloud plan are available to self-hosted users :)
This is because we haven't updated the demo to the latest version.
> but can it capture the highlighted text snippets and show them in the link details page?
That's a good idea that we might implement later, but at the moment you can only highlight the links[1].
[1]: https://blog.linkwarden.app/releases/2.10#%EF%B8%8F-text-hig...
Essentially a quote with attribution.
What I'd really love is a super compact "short-name only" view of links. Just words, not lines or galleries. For super-high content views.
https://blog.linkwarden.app/releases/2.8#%EF%B8%8F-customiza...
Does it grab the DOM from my browser as it sees it? Or is it a separate request? If so, how does it deal with authentication?
It currently stores the full webpages as a single html file, a screenshot, a pdf, a read-it-later view.
Aside from that, you can also send the webpages to the Wayback Machine to take a snapshot.
To archive pages behind a login or paywall, you can use the browser extension, which captures an image of the webpage in the browser and sends it to the server.
I ask because currently I use Readwise but have a local script that syncs the reader files to a local DB, which then feeds into some custom agent flows I have going on on the side.
Pretty easy if you have it in a bookmark html file format.
> Also, if I were using a hosted version, would I be able to eg insert/retrieve files via an API call?
Yup, check out the api documentation:
https://docs.linkwarden.app/api/api-introduction
- Does the web front end support themes? It’s a trivial thing but based on the screenshots, various things about the default theme bug me and it would be nice to be able to change those without a user style extension.
- Does it have an API that would allow development of a native desktop front end?
Yes[1].
> Does it have an API that would allow development of a native desktop front end?
Also yes[2].
[1]: https://blog.linkwarden.app/releases/2.9#-customizable-theme
[2]: https://docs.linkwarden.app/api/api-introduction
a question arose for me though: if the AI tagging is self hostable as well, how taxing is it for the hardware, what would the minimum viable hardware be?
It’s worth mentioning that you can also use external providers like OpenAI and Anthropic to tag the links for you.
[1]: https://docs.linkwarden.app/self-hosting/ai-worker
https://docs.linkwarden.app/self-hosting/ai-worker
I took a look at this... and you use the Ollama API behind the scenes?? Why not use an OpenAI compatible endpoint like the rest of the industry?
Locking it to Ollama is stupid. Ollama is just a wrapper for llama.cpp anyways. Literally everyone else running LLMs locally- llama.cpp, vllm (which is what the inference providers use, also I know Deepseek API servers use this behind the scenes), LM Studio (for the causal people), etc all use an OpenAI compatible api endpoint. Not to mention OpenAI, Google, Anthropic, Deepseek, Openrouter, etc all mainly use (or at least fully supports, in the case of Google) an OpenAI compatible endpoint.
If you don’t like this free and open source software that was shared it’s luckily possible to change it yourself…or if it’s not supporting your favorite option you can also just ignore it. No need to call someone’s work or choices stupid.
Lately I've been using MacOS and I've noticed Chromium-based browsers use more resources than the native Safari. This is especially true with Microsoft Edge, which sometimes consumes tens of gigabytes of RAM (possibly a memory leak?). In an attempt to preserve battery life and SSD longevity, Safari is now my go-to browser on MacOS.
Linkwarden looks nice, too, but when picking an option, I wanted one with a native Android app.
For example, we can go to the Wayback Machine at archive.org to not only see what a website looked like in the past, but prove it to someone (because we implicitly trust The Internet Archive). But the Wayback Machine has deleted sites when a site later changes its robots.txt to exclude it, meaning that old site REALLY disappears from the web forever.
The difficulty for a trusted archive solution is in proving that the archived pages weren't altered, and that the timestamp of the capture was not altered.
It seems like blockchain would be a big help, and would prevent back-dating future snapshots, but there seem to be a lot of missing pieces still.
Thoughts?
In some of the case studies Starling (https://www.starlinglab.org/) has published, they've published timestamps of authenticated WACZs to blockchains to prove that they were around at a specific time... More _layers_ of data integrity but not 100% trustless.
https://www.rfc-editor.org/rfc/rfc9421.html
https://httpsig.org/
Without the server participating, best you can do is a LetsEncrypt-style "we made this request from many places and got the same response" statement by a trusted party.
Inspiration: roughtime can be used to piggyback a "proof of known hash at time" mechanism, without blockchain waste. That lets you say "I've had this file since this time".
https://www.imperialviolet.org/2016/09/19/roughtime.html
https://int08h.com/post/to-catch-a-lying-timeserver/
https://blog.cloudflare.com/roughtime/
https://news.ycombinator.com/item?id=12599705
I’ve been considering switching from Raindrop to a self hosted option, but while I like self hosting I’m also leaning towards just paying someone to handle this particular service for me.
I am no longer associated with Russia in any way. It would be great if this information could be added to the article."
Source: https://numericcitizen.me/when-war-in-ukraine-influences-my-...
It doesn't work yet.
I use singlefile to archive pages I'm viewing Linkding.
Then I have a BeautifulScript4 script to strip the assets.
Then I use Jina's ReaderLM v2 to render the HTML to proper Markdown: https://huggingface.co/jinaai/ReaderLM-v2
Except, of course, for longer table oriented text documents like HN that doesn't work.
I want a plaintext archive of web pages in a github repo or similar. Not a fancy UI/UX
- SingleFile: https://github.com/gildas-lormeau/SingleFile
- Linkding: https://github.com/sissbruecker/linkding
- BeautifulScript4: https://beautiful-soup-4.readthedocs.io/en/latest/ (assumed that was the python library Beautiful Soup 4 and not "Script")
https://www.linkace.org/ (my fave)
https://github.com/sissbruecker/linkding
https://github.com/jonschoning/espial
https://motd.co/2023/09/postmarks-launch/
https://betula.mycorrhiza.wiki/
https://linkhut.org/
https://grimoire.pro/
https://hoarder.app/
https://mymind.com/
https://github.com/omnivore-app/omnivore
https://github.com/wallabag/wallabag
https://betula.mycorrhiza.wiki/
https://zotero.org/
This one seems to be directly related to the webrecorder project which seems like a pretty full featured warc recorder.
https://readeck.org/en/
I would love to hear how people use this product once they have stored the links!
I always include an archived link whenever I reference something in documentation. That's my main use at the moment.
However, I also feel like I've gotten a lot of really good value when trying to learn a new development topic. Whenever I find something that looks like it might be useful, I archive it and, because everything is searchable, I end up with a searchable index of really high quality content once I actually know what I'm doing.
I find it hard to rediscover content via web search these days and there's so much churn that having a personal archive of useful content is going to increase in value, at least in my opinion.