A Sticky String Quandary

I read this and I agree with everything that he says - but I also think it makes Haskell look worse than it is to some passer-by.

My impression is that the Haskell community is very self-critical (which is great), but someone just peeking in from the outside might think that Haskell is still in the random hobby project stage, and that it still hasn't figured out strings.

That's totally not the case though! Strings are solved, just a bit annoying to use sometimes. We're running Haskell in production and it's amazingly stable and hard to break.

I wish the community was bigger though, that's why I'm posting this to encourage everyone to try it.

StefanKarpinski · 10 years ago

Stability seems like an orthogonal issue to the standard string representation being "quite possibly the least efficient (non-contrived) representation of text data possible". As a production Haskell user, what do you do when you have to load a large amount of text data?

tome · 10 years ago

> As a production Haskell user, what do you do when you have to load a large amount of text data?

Use Text https://hackage.haskell.org/package/text

"The Text type represents Unicode character strings, in a time and space-efficient manner. This package provides text processing capabilities that are optimized for performance critical use, both in terms of large data quantities and high speed."

Buttons840 · 10 years ago

Use a string type that is appropriate. Like Text or ByteString. You have to be aware of the issue, but you usually don't have to do any extra work because there are, for instance, already functions that load a large amount of text into an efficient representation. You just have to know to use them.

nine_k · 10 years ago

C++, a card-carrying "mature production language", lived without a good commonly accepted string class for decades :) A few good implementations arose independently, though; same thing with Haskell.

Haskell has well known deficiences in its std lib (Prelude).

I would make a conjecture that it's also true for Lisps (quality of std lib is poor often) and other powerful libraries.

On other hand, languages of simpler kind, like Java or Python, have more adequate std libs.

It's because, for a really powerful language std lib has to be opinionated. And people understand that but they can't agree on something. So they live with whatever common denominator. And lost a lot of traction there.

A counterexample is Clojure where std lib is very nice if heavily skewed towards FP, reinforcing my point.

wyager · 10 years ago

Complaints about Haskell's prelude usually fall strictly under the definition of "first-world problems".

"Ugh, this length function isn't parameterized over the integrals?"

Or, alternatively:

"Ugh, this length function is parameterized over the foldables?"

People will never agree what's best, but it doesn't really matter. One nice thing about Haskell is that all the "default" functions, types, and data structures are just imported from the Prelude library. You can just import your own version if you want, and people do that.

seagreen · 10 years ago

I wish this was the case, but it's not. A bad string type and lots of partial functions are legitimate red flags, not "first world problems".

hyperpape · 10 years ago

I tend to disagree.

On the one hand, parts of the Haskell prelude are quite opinionated. String as a linked list? That's an impressive sort of FP purity: you can write tail recursive functions, and there are people out there who find that thrilling. It's just horribly space and time inefficient.

Other parts are just old. I've read that there the reason head throws an exception when applied to an empty list rather than a Maybe was because working with Maybes was much more painful when Haskell was first written. That's just ordinary backwards compatibility pain.

Other parts just seem painful for no reason whatsoever. Lazy and Strict ByteStrings/Text are both named ByteString/Text. They just live in different namespaces, so when you see code that reads "Text -> a", the only way to distinguish it is to look at what's been imported. If you couldn't decide which was preferred, they should be "StrictText"/"LazyText" (and programmers would be free to import them as Text if they so desired).

(Nevermind that Text, while essential, isn't actually part of the Prelude...).

Chris_Newton · 10 years ago

Other parts are just old.

This must be the curse of writing a language that becomes successful enough to grow a significant user base. You will inevitably discover that some of your original decisions about language features or standard library designs aren’t ideal. However, even if you know a much better way now, you can’t just replace the existing version without breaking a lot of existing code. Worse, over time people will start writing their own libraries, and some of them will adopt the same flawed conventions as your standard library by default. C++ has long struggled with strings and text processing for similar reasons.

bos · 10 years ago

There have been numerous well-known problems with the Java and Python standard libraries over the years. For instance, the date/time classes in Java were a disaster for a long time, and Python has taken decades to converge on a nearly-good-enough treatment of strings.

hyperpape · 10 years ago

Java's stdlib is mindblowing in places.

My favorite is the treatment of iteration. You have immutable collections that support an iterator interface that allows modification and throws a runtime exception (so much for a type system...).

There is an interface that lets you iterate over the elements of something that supports it without allowing removal, but it's not recommended, and it doesn't enable the enhanced for loop that came with Java 1.5: https://docs.oracle.com/javase/7/docs/api/java/util/Enumerat....

It's as if someone said "how wrong can we get this?"