HTTP: , FTP:, and Dict:?

masswerk · a year ago

Nowadays, on macOS, "dict://Internet" will open the Dictionary app with the query "Internet". (Probably behind a security prompt.) Not sure, if there's similar functionality on other operating systems.

dmd · a year ago

What do you mean by "behind a security prompt"?

bobbylarrybobby · a year ago

Browsers generally throw up a prompt before opening an external app

promiseofbeans · a year ago

I tried it and firefox came up with a prompt confirming I wanted to open 'internet' in 'Dictionary'

im3w1l · a year ago

I admire these old protocols that are intentionally built to be usable both by machines and humans. Like the combination of a response code, and a human readable explanation. A help command right in the protocol.

Makes me think it's a shame that making a json based protocol is so much easier to whip up than a textual ebnf-specified protocol.

Like imagine if in python there was some library that you gave an ebnf spec maybe with some extra features similar to regex (named groups?) and you could compile it to a state machine and use it to parse documents, getting a Dict out.

dspillett · a year ago

> Makes me think it's a shame that making a json based protocol is so much…

Maybe I'm not the human you are thinking of, being a techie, but I find a well structured JSON response, as long as it isn't overly verbose and is presented in open form rather than minified, to be a good compromise of human readable and easy to digest programmatically.

marcosdumay · a year ago

The legibility is probably one of the main reasons JSON got adopted. XML can be made to not look too bad, but in SOAP it must be unreadable, so everybody was looking into fixing this.

keepamovin · a year ago

Yeah, but JSON has the vulnerability that it will tend to become like an overgrown garden over time, because it can. The type of TCP <code> <response> <text> protocols the GP talks about have the benefit of their restrictions and are definitely a better balance (human / machine), and more stable.

orf · a year ago

Unfortunately, in practice they are a nightmare. Look at the WHOIS protocol for an example.

Humans don’t look at responses very much, so you should optimise for machines. If you want a human-readable view, then turn the JSON response into something readable.

poincaredisk · a year ago

Whois protocol has no grammar to speak of (the response body is not defined at all, just a text blob) which makes it a nightmare to parse. Having a proper response format would solve this.

Though I agree, I prefer my responses in JSON.

astrobe_ · a year ago

The next logical step is to use a machine-friendly format instead; that is a binary protocol.

Even HTML and XML which were designed for readability and manual writing eventually became 'not usable enough" ("became" because I think part of it is that their success made them exposed to less technical populations), and now we have markdown everywhere which most of the times is converted to HTML.

So if you are going to use a tool more sophisticated than Ed/Edlin to read and write (rich) text in a certain format, it could be more efficient to focus on making the job of the machine - and of the programmer, easier.

If you look at a binary protocol such as NTP, the binary format leaves very little room for Postel's principle [1], so it is straightforward to make a program that queries a server and display the result.

[1] https://en.wikipedia.org/wiki/Robustness_principle

kragen · a year ago

maybe we could have a format that was more human-readable than json (or especially xml) but still reliably emittable and parseable? yaml, maybe, or toml, although i'm not that enthusiastic about them. another proposal for such a thing was ogdl (https://ogdl.org/), a notation for what i think are called rose trees

> OGDL: Ordered Graph Data Language

> A simple and readable data format, for humans and machines alike.

> OGDL is a structured textual format that represents information in the form of graphs, where the nodes are strings and the arcs or edges are spaces or indentation.

their example:

    network
      eth0
        ip   192.168.0.10
        mask 255.255.255.0
        gw   192.168.0.1

    hostname crispin

another possibility is jevko; https://jevko.org/ describes it and http://canonical.org/~kragen/rose/ are some of my notes about the possibilities of similar rose-tree data formats

hnlmorg · a year ago

Formats like TOML are horrible for heavily nested data (even XML does a better job here) and the last time I checked, TOML didn’t support arrays at the top level.

YAML is nicer than JSON to write, but I wouldn’t say it’s any nicer to read.

If you want something that’s less punctuation heavy, then I’d prefer we go full Wirth and have something more akin to Pascal.

donatj · a year ago

In my department's (we were formerly our own company) internal framework throwing .html on the end of any JSON response outputs it in nested HTML tables. I personally find it very helpful.

hnlmorg · a year ago

At that point you might as well drop JSON altogether and use an XHTML subset so your rendered output is also valid XML (instead of having two different and incompatible markups merged together)

zzo38computer · a year ago

> I admire these old protocols ...

The protocols that have a response code with an explanation is helpful. A help command is also helpful. So, I had written NNTP server that does that, and the IRC and NNTP client software I use can display them.

> Makes me think it's a shame that making a json based protocol is so much easier to whip up ...

I personally don't; I find I can easily work with text-based protocols if the format is easily enough.

I think there are problems with JSON. Some of the problems are: it requires parsing escapes and keys/values, does not properly support character sets other than Unicode, cannot work with binary data unless it is encoded using base64 or hex or something else (which makes it inefficient), etc. There are other problems too.

> Like imagine if in python there was some library that you gave an ebnf spec ...

Maybe it is possible to add such a library in Python, if there is not already such things.

somat · a year ago

REST (REpresentational State Transfer) as a concept is very human orientated. The idea was a sort of academic abstraction of html. but it can be boiled down to: when you send a response, also send the entire application needed to handle that response. It is unfortunate that collectively we had a sort of brain fart and said "ok, REST == http, got it" and lost the rest of the interesting discussion about what it means to send the representational state of the process.

sixdimensional · a year ago

May I humbly submit “parsing expression grammars”[1] for your consideration?

Fairly simple and somewhat fun.. Python has PEG parsing built in, but also the pyparsing or parsimonious modules too.

I have built EDI X12 parsers and toy languages with this.

[1] https://en.wikipedia.org/wiki/Parsing_expression_grammar

sixdimensional · a year ago

Also lark in Python too

fouc · a year ago

textual ebnf-specified protocol > json

praveen9920 · a year ago

> in an age of low-size disk drives and expensive software, looking up data over a dedicated protocol seems like a nifty2 idea. Then disk size exploded, databases became cheap, and search engines made it easy to look up words.

I love this particular part of history about How protocols and applications got build based on restrictions and got evolved after improvements. Similar examples exists everywhere in computer history. Projecting the same with LLMs, we will have AIs running locally on mobile devices or perhaps AIs replacing OS of mobile devices and router protocols and servers.

In future HN people looking at the code and feeling nostalgic about writing code

bruce511 · a year ago

I've been working with other programmers for over 30 years, and the impact of this scarcity mindset has very real implications in behavior today.

In much the same way that someone who grew up with food insecurity views food now, even if the food is now plentiful.

For example, memory and disk space were expensive. So every database field, every variable, was scrutinized for size. Do you need 30 chars for a name? Would 25 do?

In C especially all strings are malloced at run time, predefined strings with max length are supported but not common.

Arguments (today) about the size of the primary key field and so on. Endless angst about "bloat".

I understand that there are cases where size matters. But we've gone from a world where it mattered to everything to a world where it matters in a few edge cases.

Given that all these optimizations come with their own problems it can be hard to break old habits.

pjc50 · a year ago

It is, until it isn't. This stuff very much matters for games.

There's also a general argument about resource usage, but I think the AI and crypto people have largely won the argument that it's OK to use as much electricity as you want as long as you're making money somehow.

OkGoDoIt · a year ago

I wish some of those old timers who care about bloat could work their magic on the web development ecosystem

praveen9920 · a year ago

I think because of levels of abstraction of software/pltforms, top layers are bound to have « bloat ». Unless someone( something) changes radically across the stack, I think bloat will just increase for each layer

Cthulhu_ · a year ago

But on the other hand, for some applications, disk requirements exploded as well and require dedicated protocols and servers for it; for example Google's monorepo, or the latest Flight Simulator, the 2024 version will take up about 23 GB as the base install and stream everything else - including the world, planes, landmarks, live real-world ship and plane locations, etc - off the internet. Because the whole world just won't fit on any regular hard drive.

mrguyorama · a year ago

>Because the whole world just won't fit on any regular hard drive

Except it did. FS2020 had a base level of terrain quality that was installed such that you could play it even if you never connected to the internet. It wasn't Bing Earth quality sure, but it was way better than what you got in FSX thirteen years earlier. It was 200gb.

Which is conveniently about the same size as a single copy of whatever Call of Duty game is currently in vogue.

latexr · a year ago

> Projecting the same with LLMs, we will have AIs running locally on mobile devices

That’s not much of a projection. That’s been announced for months as coming to iPhones. Sure, they’re not the biggest models, but no one doubts more will be possible.

> or perhaps AIs replacing OS of mobile devices and router protocols and servers.

Holy shit, please no. There’s no sane reason for that to happen. Why would you replace stable nimble systems which depend on being predictable and low power with a system that’s statistical and consumes tons of resources? That’s unfettered hype-chasing. Let’s please not turn our brains off just yet.

AStonesThrow · a year ago

What is an AI? Just an LLM? What do you call Siri, Cortana, and Google Assistant? I've already received a Gemini app on Android, and they're promoting the premium version too. Runs locally, yes?

30 years ago, my supervisor wrote, from scratch, an "AI" running on the company web server (HP/UX PA-RISC; 32-64MB RAM), that would heuristically detect and block suspected credit-card fraud. Remember that "AI" is a perennial buzzword with fluid definitions, both an achievable goal right now, and a holy grail. ¡Viva Eliza!

praveen9920 · a year ago

You are going with assumption that llms will remain same forever. There are attempts everywhere to make them smaller, efficient and more focused.

Would you have called « transferring mails online « in the 90s is hype-chasing because our postal system was working great? Probably not a great analogy but you get the point

mycall · a year ago

Imagine if dict://internet was renamed to agent://source, then agentic calls to model sources could interconnect with ease. With HTTP/3, one stream could be the streaming browser and the other streams could be multi-agent sessions.

praveen9920 · a year ago

I was thinking on the same lines but with improvements towards running local models is right around the corner. Unless someone has specific usecase or proprietary models, this may not make sense. But in current situation, that would be a great thing to standardise communication between various llms and clients

38 · a year ago

Given that most current AI generated code is dogshit, I would say we are well off from that.

praveen9920 · a year ago

Not right now but we are seeing the accelerated trends of generated code. How long would it take for regular use cases to be completely AI generated on the go. Edge cases exists everywhere.

hebocon · a year ago

I recently began testing my own public `dictd` server. The main goal was to make the OED (the full and proper one) available outside of a university proxy. I figured I would add the Webster's 1913 one too.

Unfortunately the vast majority of dictionary files are in "stardict" format and the conversion to "dict" has yielded mixed results. I was hoping to host _every_ dictionary, good and bad, but will walk that back now. A free VPS could at least run the OED.

tomsmeding · a year ago

> to make the OED (the full and proper one) available outside of a university proxy.

Was the plan to do this in a legal fashion? If so, how?

hebocon · a year ago

No, it's not possible to do in a legal fashion I don't think. The existing methods require a browser-based portal that requires you to be logged in with your university proxy.

As an alumnus I could do this by showing up in person to my university and accessing that way. But I'm not going to.

kragen · a year ago

what's the stardict format? which edition of the oed are you hosting? i scanned the first edition decades ago but i don't think there's a reasonable plain-text version of it yet

cormorant · a year ago

StarDict (a program/file format) is easily googlable. A bit of a rabbit hole is that it's been chased around hosting providers because its site (used to) offer downloads of copyrighted dictionaries, including the OED 2nd edition. I don't know how these files were originally obtained or produced. See: https://web.archive.org/web/20230718140437/http://download.h...

Edit to add: Also, "i scanned the first edition decades ago" sounds like quite a story. 13 volumes? What project were you doing?

hebocon · a year ago

It will be the 2nd edition which is freely available on the internet with all of the usual copyright concerns.

And it's already in 'dict' format so I didn't need to convert.

wormius · a year ago

Wow, either I've forgotten this existed, or had no clue, I was around for this era, and I remember Veronica, Archie, WAIS, Gopher, etc, but never recall reading about a Dict protocol, nice find!

hkt · a year ago

I've been aware of dict for a while since I wrapped up an esperanto to english dictionary for KOReader in a format KOReader could understand. What I'd really have liked is a format like this:

dict://<server/<origin language>/<definition language>/<word>

Still, it is pretty cool that dict servers exist at all, so no complaints here.

cratermoon · a year ago

Oh yes, I remember dictionary servers. Also many other protocols.

What happened to all of those other protocols? Everything got squished onto http(s) for various reasons. As mentioned in this thread, corporate firewalls blocking every other port except 80 and 443. Around the time of the invention of http, protocols were proliferating for all kinds of new ideas. Today "innovation" happens on top of http, which devolves into some new kind of format to push back and forth.

giantrobot · a year ago

I wouldn't place all the blame on corporate IT for low level protocols dying out. A lot of corporate IT filtering was a reaction to malicious traffic originating from inside their networks.

I think filtering on university networks killed more protocols than corporate filtering. Corporate networks were rarely the place where someone stuck a server in the corner with a public IP hosting a bunch of random services. That however was very common in university networks.

When university networks (early 00s or so) started putting NAT on ResNets and filtering faculty networks is when a lot of random Internet servers started drying up. Universities had huge IPv4 blocks and would hand out their addresses to every machine on their networks. More than a few Web 1.0 companies started life on a random Sun machine in dorm rooms or the corner of a university computer lab.

When publicly routed IPs dried up so did random FTPs and small IRC servers. At the same time residential broadband was taking off but so were the sales of home routers with NAT. Hosting random raw socket protocols stopped being practical for a lot of people. By the time low cost VPSes became available a lot of old protocols had already died out.

nunobrito · a year ago

Nice find, didn't knew the protocol either. The site lists all available dictionaries here: https://dict.org/bin/Dict?Form=Dict4

I'll then be writing a java server for DICT. Likely add more recent types of dictionaries and acronyms to help keeping it alive.