It Can Happen to You - Readit News

Loving the progression here. Tomorrow, someone’s going to reduce the boot times of macOS by 90% by the same principle. A week from now, someone will prove P=NP because all the problems we thought were NP were just running strlen() on the whole input.

CGamesPlay · 5 years ago

And maybe, in a decade or so, the man page for these functions will list their algorithmic complexity! That was the most interesting takeaway from this article, for me at least. I have only seen a one or two libraries that actually list this in their documentation.

BenFrantzDale · 5 years ago

All of the C++ algorithms list complexity guarantees, I believe. This saga stunned me to learn that C doesn’t seem to do this.

est31 · 5 years ago

The cppreference page linked by the blog post has been changed since: https://en.cppreference.com/w/cpp/io/c/fscanf#Notes

> Note that some implementations of sscanf involve a call to strlen, which makes their runtime linear on the length of the entire string. This means that if sscanf is called in a loop to repeatedly parse values from the front of a string, your code might run in quadratic time

formerly_proven · 5 years ago

I don't know if it is required to, but there doesn't really seem to be an upper bound to what glibc's scanf will eat for a %f (e.g. a gigabyte of zeroes followed by "1.5" will still be parsed as 1.5), so for that implementation there certainly isn't a trivial upper bound for the amount of input read and processed that is done for %f, like you would perhaps expect.

Yet another reason to not stringify floats. Just use hexfloats (but beware of C++ standard bugs involving >>) or binary.

Unfortunately "gigabytes of numerical data, but formatted as a text file" is commonplace. For some reason HDF5 is far less popular than it ought to be.

fastball · 5 years ago

Listing time complexity up top is my favorite thing about the Redis docs.

https://redis.io/commands/

yread · 5 years ago

How would it help you knowing that it's O(n)? It needs to read all the characters of the float. Problem is that it's needlessly reading characters even after the float

TeMPOraL · 5 years ago

You're joking, but now I'm thinking about the XML we parse at work and the library we're using to do it. We parse a lot of it, but I've always had this vague feeling that it takes a bit too long (given the codebase is C++).

The XML library we use is rather well-known, so if someone found a bug like this there, I'd suspect a general improvement of performance across the board in the entire industry. Efficient Market Hypothesis tells me it's unlikely the library has this problem, but then again, so I thought about AAA videogames, and then GTA Online thing came out.

toyg · 5 years ago

> it's unlikely the library has this problem

Any sufficiently-complex library code likely has plenty of problems, often unavoidably so (e.g. trade-offs between best performance and edge cases). Whether they have been found or not is a function of many, many factors.

> Efficient Market Hypothesis

I've lived long enough to be very sceptical about that sort of thing. Markets tend to be efficient in aggregate, maybe, but on the single case they can fail quite brutally. Look at how "dramatic" bugs are overlooked even in critical pieces of infrastructure like openssl, for years and years; maybe it happens less for openssl than most equivalent libraries, but it still happens.

Also, once the "market" for standards moves on, network effects make it very hard to have any meaningful competition. I mean, who writes XML parsers nowadays? Whichever XML lib was winning when JSON "happened" is now likely to stay in control of that particular segment; and the likelihood that top developers will keep reviewing it, falls off a cliff. Sprinkle a bit of cargo-cultism on top, and "efficient markets" become almost a cruel joke.

acqq · 5 years ago

It's possible. I've personally reduced the time spent for reading huge XML file on the startup of an application at least 10 times in the application I was in charge of, by avoiding the library dependence and writing a custom code. Having a lot of experience in such kinds of code and in the performance issues, it was quite a fast change with no negative effects.

The prehistory of that was simple: up to some point the amount of data stored was reasonably small. Then from some point on the amount of data grew significantly (a few orders of magnitude), and the startup times became very unpleasant.

There's a lot that is going on when loading huge XML files. As an example, don't forget all the possible Unicode conversions, all the possible allocations of the elements in the handling code, just to be discarded etc.

I don't suggest everybody doing it "just because" but if some specific use is known to have very specific assumptions and it is in the "hot path" and really dominates (profile first!) and it is known that only a small subset of all XML possibilities will ever be used it can be justified to avoid the heavy libraries. For example, in that specific case, I knew that the XML is practically always only read and written by the application, or by somebody who knew what he was doing, and not something that somebody random in some random form would regularly provide from the outside, and I knew that my change surely won't break anything for years to come, as I knew for sure that that part of application was not the "hot spot" of expected future changes.

So it was a win-win. Immensely faster application startup, which is something that improved everybody's work, while preserving the "readability" of that file for the infrequent manual editing or control (and easy diff).

drdec · 5 years ago

If you have a lot of nesting in the XML, and it is formatted for human reading (i.e. indented), you may want to consider not doing that. We had a project where we were creating human-readable versions of the XML (mostly for developer convenience) and then parsing it. When we stopped adding all the extra white space the parsing speed increased a couple of orders of magnitude. (The downside was we no longer had built in coffee breaks in our development process.)

marton78 · 5 years ago

What about parsing that XML upfront, serialising to some binary format (e.g. CBOR, maybe with nlohmann's JSON library, or Cap'n Proto) and shipping the binary file?

rightbyte · 5 years ago

I wonder of scanf on Playstation was not using strlen in that way. GTA was written for PS right?

ineedasername · 5 years ago

That's actually a very simple one. Just run a regex on "P != NP" to remove the "!" and you're good to go.

ctoth · 5 years ago

Seriously the most I have laughed in like 6 months. Which probably says a lot more about me than this joke. I know that jokes aren't really welcome on HN, and I generally really like this policy. But just had to mention this was just ... what I needed to read right now.

fanf2 · 5 years ago

Well, https://twitter.com/stephentyrone/status/1366573121365565444

>> So many years ago when I first took over the iOS string functions, I found that like 30% of boot time in debug builds was in strstr. <<

>> Needless to say, we did not fix that issue by writing a more efficient strstr. Removed the parser and then removed strstr from the environment where it had been in use =) <<

specialist · 5 years ago

You kid. But truer things are said in jest.

> ...Tomorrow, someone’s going to reduce the boot times of macOS by 90% by the same principle.

My 2019 MacBook often pauses when I connect the charging cable. Sometimes it just seizes, requiring a hard bounce.

Clearly there's a contended lock buried deep. Something non-obvious.

I'm certain everything these days has dozens of hidden quadratics and contended locks.

Which is one of the reasons I'm excited for stuff like structured concurrency (Java's Project Loom) and retoolings like systemd becoming the norm.

Ages ago I worked on kitchen sink app that had a very finicky startup. Any breeze would break it. Much consternation by mgmt. Apparently if we only clapped louder, Tinkerbell would fly. I couldn't take it any more. LARPing as a bulldozer, I replumbed the innards, changing from something like initd to be more like systemd with some lazy loading for good measure. Voila!

Back to GTA. The failure here is the product owner didn't specify a max load time, and then hold the team to it. Devs will mostly do the work that's expected of them. If load time wasn't measured (and managed), no one is going to bother with expunging sscanfs.

dcolkitt · 5 years ago

> My 2019 MacBook often pauses when I connect the charging cable. Sometimes it just seizes, requiring a hard bounce.

Yesterday my MBP kernel panicked because my keyboard was low on battery and the bluetooth connection kept dropping. There's something weird with MacOS where peripherals seem to really not be well isolated from the core OS runtime.

nightmunnas · 5 years ago

I don't think I've laughed this much since the pandemic started, well done.

coolgeek · 5 years ago

I just got an infinite loop down to 8.6 seconds! And I'm not done yet!

eru · 5 years ago

You joke, but there's actually lots of work going on into what techniques will definitely NOT be enough to settle P=NP.

(I find it pretty exciting, that this kind of negative result is possible. Ain't mathematics wonderful?)

bsldld · 5 years ago

> A week from now, someone will prove P=NP because all the problems we thought were NP were just running strlen() on the whole input.

You owe me a keyboard!

xconverge · 5 years ago

Agreed, also hey bud, hope you are doing well

johnfn · 5 years ago

Hey, there’s a username I recognize! Long time no see! Same to you :)

stainforth · 5 years ago

Next Hacktober should offer a free t-shirt to anyone who goes out and issues a PR to a repo containing sscanf

randomswede · 5 years ago

P = NP for "P == 0" and/or "N == 1".

mfbx9da4 · 5 years ago

actually lolled

iab · 5 years ago

Amazing

I think the really embarrassing part for Rockstar is that they didn't bother to investigate what took 5+ minutes to load in their star product, a simple profiling would've made the issue obvious. So either they knew and they didn't care, or they didn't know and they didn't care.

That being said both for GTA and for TFA the issue is a very similar sscanf call:

     sscanf(data, "%f", &f);

I already posted a similar comment in the GTA story but I really want to emphasize it: scanf is almost never the right tool for the job, and it's definitely not the right tool in this situation. Just use strtof. That's literally what it's for. String to float. There. Done.

Scanf is crappy and if it were up to me would've been deprecated a while ago. I can sort of see using it for a quick one-off "script", for instance to parse user input, but seeing it in the middle of a program will always raise a huge red flag for me.

Use strtok_r if you need to split a string, then parse every entry individually. It's more robust, more flexible (you can parse custom types and formats that way) and allows for much better error handling and diagnostics. And of course it's also usually vastly faster.

Scanf is an antipattern in my opinion. I literally never use it and I'm better off for it. The last time I interviewed for a C coder position I managed to answer the full C test quizz except for the one question regarding scanf. That's how much I don't use it.

I think it's even worse for developers who come from higher level languages and (reasonably) expect to be able to deserialize data easily. You simply can't do that in C, the type system and general philosophy of the language won't let you, but scanf may convey the illusion that it's sort of possible. Don't believe its lies.

tgtweak · 5 years ago

I was thinking about it the other day when reading the original article, and this was the only plausible (and defensible) cause for it not being addressed:

When GTA online was released 7 years ago in 2013, the list of DLC items was probably much shorter, and grew over time. The performance issue is exponentially aggravated with list-length. The list growth was probably bell-curve shaped over the lifetime of the game.

This has an interesting dynamic when it comes to perceived performance:

In the beginning, on consoles and PCs - it was already a pretty long load time, but would have been 90s or so on an average gaming PC (I remember this from the early days playing it, on a modest gaming PC with an FX-8150 cpu). This is long, but tolerable for a game of this size. I'm certain that early complaints that it was sluggish to load were profiled and looked at, and at the time it wasn't a 4 minute ordeal to load the json and probably represented a fraction of the CPU time it takes today - not standing out as obviously as in OPs guerilla profiling. Devs put a pin in it and say "this is netcode related, it is what it is"

Over time, the list gets longer, the loading time takes more cycles, BUT, PCs are getting progressively faster year over year as well, with many of those improvements happening at the instruction-level - optimizing for things like, surprise, string scanning. Two console generations are released since, masking the problem on that side. For comparison sake, I just checked and I can load GTA online in about 75s on my Ryzen 3900x. This cpu is probably 4-6x faster in single core performance than the 8150 for most workloads. Again, it's slow but tolerable and by this time it's "yeah GTA online is just a big game and takes a while to load, it's always been that way". Complacency is the enemy of improvement, and things that regress slowly over time are hard for us to notice in general.

Don't take this as a "this is fine" comment, but instead the only reasonable justification I can think of as to why it might have flown under the radar all these years.

lostcolony · 5 years ago

I think 'embarrassing' is too strong a word. AAA game development is rushed; the pressure is to ship. Something has to give. This is a user facing issue, but one that doesn't actually affect the gameplay. Assuming they had -time- to profile the load process, given that low a priority, seems extremely optimistic.

Hamuko · 5 years ago

>AAA game development is rushed; the pressure is to ship.

I'd be more understanding if GTA Online hadn't already shipped its first version in October of 2013. Surely there would've been some time after shipping the first version to profile the game.

wpietri · 5 years ago

Well, it doesn't affect the gameplay if the player starts the game once and never closes it. But for anybody who wants to hop on for a quick bit of fun, it's a notable barrier. There are definitely games I've stopped playing because it takes too much time to launch the thing.

simias · 5 years ago

I wouldn't have said anything if the game was released one month ago, but GTA V is almost 8 year old now and it's been ported to several generations of hardware (IIRC they've even announced "next gen" ports to release this year). The online function is still maintained and makes them a lot of money. I also do think that it affects the gameplay because these loading times are genuinely terrible. A 30second loading screen is a nuisance, a 5+ minute loading screen just makes me want not to play the game.

I think that Rockstar deserves some blame here, especially since this problem might well be a consequence of their notoriously bad development practices.

acomjean · 5 years ago

I tend to agree. When you are rushing things get missed. Also if it was a problem from the beginning you just might not think its an issue (its just how long it takes) .

One philosophy I heard in my days of programming (not sure how I remembered this but its still out there) :

Make it work, make it right, make it fast. -- Kent Beck

undefined1 · 5 years ago

embarrassing is too light a word.

Rockstar has virtually endless resources and the game has been out for many years. for years, they didn't reduce the extremely long load times? not only embarrassing, but shows deep incompetence and lack of respect for the craft and for end users.

froh · 5 years ago

scanf and printf have complementary format specifiers, which can make maintaining serialization and parsing of regular data a breeze...

the proper remedy is to simply wrap the string to parse with fmemopen(3), which makes the temporary FILE object explicit and persistent for the whole parse, and needs just one strlen call.

https://news.ycombinator.com/item?id=26343149

abetlen · 5 years ago

Cool trick, thanks for sharing. I don't get why there isn't a suitable snscanf function that takes the buffer length as an argument and returns the number of bytes parsed?

IanNorris · 5 years ago

Not excusing this but there are likely a few mitigating factors here.

* Tight deadlines result in shipping code that's barely tested and may have resulted in minimal code reviews on it. * The original article mentioned how the ids were always unique. It may have been intended to load content from multiple sources or to allow patching of content on disk (or repurposed entirely from a different game). Or it could well be an oversight/over-engineering. * It may even be a general purpose json parser from another project that had never been tested with data of this size until after launch. * It probably wasn't always this bad. Likely when the game launched the loading times were much more reasonable as the amount of in-app-purchases was an order of magnitude smaller.

Typically most of the IAPs will be added much later, so much of the profiling work would have been done with this code having a much smaller json block.

When the game was shipped the dev team will likely have been shrunk significantly as the bulk of the team moves to a new project leaving a smaller team with a focus more on the content itself and the engine team that likely deal with and spot stuff like this will probably have their attention elsewhere.

Don't work for R*, have shipped many high budget titles though including live services.

SomeCallMeTim · 5 years ago

Agreed. My first instinct was the same: *scanf is never the right tool for pretty much any job.

I learned this 20+ years ago. As far as I'm concerned it should have been considered deprecated along with gets; it was considered dangerous in the early 90s and probably before. Not sure why people are still using it in the 2000s+.

randyrand · 5 years ago

why?

earthboundkid · 5 years ago

The Go standard library is pretty good, but unfortunately, it includes a scanf clone, so every once in a while you see a poor new developer posting to help forums trying to get it to work properly and you have to break it to them that they're using the wrong tool for basically any job.

8140 swift::ASTVisitor<(anonymous namespace)::StmtChecker, void, swift::Stmt*, void, void, void, void>::visit(swift::Stmt*) (in swift-frontend) + 125 [0x110560f9d] 8140 (anonymous namespace)::StmtChecker::typeCheckASTNode(swift::ASTNode&) (in swift-frontend) + 1043 [0x11055dcc3] 8140 (anonymous namespace)::DeclChecker::visit(swift::Decl*) (in swift-frontend) + 4497 [0x1104e0721] 8140 swift::TypeChecker::typeCheckPatternBinding(swift::PatternBindingDecl*, unsigned int, swift::Type) (in swift-frontend) + 250 [0x11049648a] 8140 swift::TypeChecker::typeCheckBinding(swift::Pattern*&, swift::Expr*&, swift::DeclContext*, swift::Type, swift::PatternBindingDecl*, unsigned int) (in swift-frontend) + 140 [0x1104962bc] 8140 swift::TypeChecker::typeCheckExpression(swift::constraints::SolutionApplicationTarget&, swift::OptionSet<swift::TypeCheckExprFlags, unsigned int>) (in swift-frontend) + 897 [0x110495e71] 8140 swift::constraints::ConstraintSystem::solve(swift::constraints::SolutionApplicationTarget&, swift::FreeTypeVariableBinding) (in swift-frontend) + 974 [0x11032cb1e] 8140 swift::constraints::ConstraintSystem::solve(llvm::SmallVectorImpl<swift::constraints::Solution>&, swift::FreeTypeVariableBinding) (in swift-frontend) + 52 [0x11032d8b4] 8140 swift::constraints::ConstraintSystem::solveImpl(llvm::SmallVectorImpl<swift::constraints::Solution>&) (in swift-frontend) + 372 [0x11032aa14] 8135 swift::constraints::ComponentStep::take(bool) (in swift-frontend) + 2911 [0x1103393af] + 4015 swift::constraints::ConstraintSystem::finalize() (in swift-frontend) + 5258,5080,... [0x110325a7a,0x1103259c8,...] + 1819 swift::constraints::ConstraintSystem::finalize() (in swift-frontend) + 5291 [0x110325a9b]

mkeeter · 5 years ago

Blog author here! Thanks to HN for warning me about sscanf at exactly the right time – within a day of me trying to load some ASCII STLs and noticing it was slow...

Linked deep in the Twitter replies [1], there's an open glibc issue about this, dating back to 2014:

https://sourceware.org/bugzilla/show_bug.cgi?id=17577

C doesn't have any requirements on the complexity of sscanf, so it might not be a bug per se, but it's certainly... a pitfall.

[1] https://twitter.com/impraxical/status/1367194430835425283

Waterluvian · 5 years ago

I love the format of this blog post. Perfect length. Perfect detail. You don't waste words. A good number of images.

ignoramous · 5 years ago

I like this blog post but I also happen to quite like long forms like these, too: https://mango.pdf.zone/finding-former-australian-prime-minis... / https://apenwarr.ca/log/20201227

lambda_obrien · 5 years ago

And it works without javascript!

rkagerer · 5 years ago

Hey, Matt, neat to see you here and congrats on making the front page! Recognize you from the Formlabs forums & conferences.

Love that notion of professional empathy underscoring your message in the blog post.

Hey RK! Thanks for the kind words – hopefully I'll see you again, once we're back to holding in-person events!

koalahedron · 5 years ago

I know Matt too, primarily from him rejecting my Libfive PRs for being "too 1337".

But seriously, Matt, I might have a project for an in-person event regarding printing robots. Stay tuned--to what channel I don't know.

IMO the lack of a complexity requirement is a bug in the C standard. And really it’s a bug in the implementation(s?) too. If it can be done on O(1), shame on library authors for doing it in O(n). If you want programmers to trust library authors, don’t do this to us. Maybe std::from_chars FTW?

quelsolaar · 5 years ago

This is not a complexity issue with the function. The function is linear to the input, as it should be. The problem is that the implementation does more work then it needs to (it doesn't need the length of the string). It should be linear to the end of parsing not the end of string. The complexity in this case comes from the loops calling it.

Deleted Comment

ironmagma · 5 years ago

Shouldn’t we just come clean and admit to ourselves that there is no such thing as the C standard? There is a collection of loosely related languages that look similar and that collectively we call C, but really they’re all completely different and share almost no interoperability or common characteristics. And those standards that do exist provide almost no ability to reason about your code including things like ordering of statements.

gnyman · 5 years ago

Thank you for introducing me to the concept/term Ascetic programming. Not sure how widely used it is, but I find it more fitting for what I try to do than minimalistic or KISS.

Also, it is great to see someone write

  > I noticed that ASCII STL loading was really quite slow.
  > From startup to showing the window, it took over 1.8 seconds!

I always find pleasure seeing projects which highlight just how fast modern computers really are.

Re-read my comment and to be clear, the quote and the last paragraph are not related. The last sentence was meant to refer to the Erizo project as a nice single purpose high performing tool, not as a comment to the bug that made it slow.

chromanoid · 5 years ago

When looking at loading performance of STLs, you may want to look at this: https://papas-best.com/stlviewer_en

printf and scanf match nicely with their format specifiers, so the serialization and deserialization can be maintained nicely in lockstep.

to avoid the quadratic overheating sttlen you can simply use fmemopen(3), which makes the temporary sscanf FILE object explicit and persistent for the whole parse, and needs just one strlen call.

echlebek · 5 years ago

Thanks for the great blog post, and congrats on finding that long dormant bug!

fermienrico · 5 years ago

Love the post, except for the title. It is too click-baity.

_kst_ · 5 years ago

There's a bigger potential problem with the *scanf() functions than performance. They are inherently unsafe for reading numeric input.

For example, if you do something like this:

    int n;
    sscanf("9999999999999999999999999", "%d", &n);

the behavior is undefined. As the C standard says:

> ... the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

You can control the appropriate type by writing the call directly, but you can't guarantee that the result can be represented unless you have control over the input.

Remember that in C "undefined behavior" doesn't mean that your program will fail, or will crash, or will tell you there was a problem. It means that a conforming implementation can do literally anything. In the worst case, it will do what what you expect until it fails at the most inconvenient possible moment.

Now most implementations will probably do something sane, like setting a floating-point object to infinity or an integer object to some arbitrary value, but the language doesn't guarantee anything.

If you want to write safe code, you can extract a substring that represents a number and pass it to one of the strto*() functions, which do have well defined behavior on overflow. (But I couldn't tell you exactly what that behavior is without looking it up.)

sicariusnoctis · 5 years ago

> Remember that in C "undefined behavior" [...] means that a conforming implementation can do literally anything. In the worst case, it will do what what you expect until it fails at the most inconvenient possible moment.

Actually, the worst case possibility is that your program will become Skynet, enslave humanity for 10000 years, collapse all stars in the universe into black holes, and significantly accelerate processes such as the heat death of the universe.

Xorlev · 5 years ago

There's probably an npm package for that. :)

saagarjha · 5 years ago

Programs don't need undefined behavior to do those things, you just need to wait a bit :)

ncmncm · 5 years ago

It's even allowed for the program to change all extant texts of the Standard to indicate that its behavior is well-defined.

vlovich123 · 5 years ago

You’re misreading the standard a bit I think. It’s saying undefined behavior comes from the format string (which you should control and is a common compiler warning if it’s not a literal) doesn’t match the types of variables you pass it. This is kind of obvious when you think about it. Variadic C functions lose type information so the format string is the source of that.

The “out-of-range” issue just means that the library isn’t going to mandate every implementation of this function is guaranteeing to provide the same overflow behavior (some might stop when you saturate, others might stop at the end of digits input and overflow, others might detect the overflow and saturate).

The Linux man page is clearer here IMO:

> If the number of conversion specifications in format exceeds the number of pointer arguments, the results are undefined. If the number of pointer arguments exceeds the number of conversion specifications, then the excess pointer arguments are evaluated, but are otherwise ignored.

That’s the only spot the word “undefined” appears and doesn’t discuss overflow. My general impression is that the “undefined” problem largely only applies to language operations or user input causing a library to perform such an undefined behavior. Older C functions with older documentation may be using “undefined” with a less strict meaning to also cover “implementation-defined”. The “undefined behavior” brouhaha came up in the past 5-10 years only when compilers actually started leveraging it breaking a lot of assumptions.

imtringued · 5 years ago

Wow, the problems are so deep that no human should ever use sscanf again. It's just as bad as gets() because there is no way for the programmer to deal with the error case.

ajkjk · 5 years ago

Obligatory: it is flabbergasting that this is a quality of language that is still in active use.

Gibbon1 · 5 years ago

It's even more flabbergasting that all these problems haven't been fixed.

hctaw · 5 years ago

Most of these things can't be fixed at the language level without seriously breakages or preventing users from doing necessarily dangerous things.

codetrotter · 5 years ago

I am writing an app for iOS in Swift and I have an array of structs with some 70,000 elements or thereabouts and for some bizarre reason the compiler uses so much memory if I define it as such directly in the source, that I run out of memory. So instead as a workaround for now I am storing the data as a JSON string that I parse at runtime. It’s very sad, but it’s the only option I had because I have a ton of other code to write too for this app and cannot afford to spend the time to even make a binary format for this data.

But I don’t understand why the Swift compiler decides to use so much RAM when compiling it in the first place. The string representation of the data itself is only ~3 MB. But when I tried to declare the data as an array of structs in Swift directly it uses gigabytes of memory when I try to compile it, which causes the system to start swapping and then the disk space runs out because I only have about ~20 GB of free space on the disk, so then the system can’t swap no more and is out of RAM also.

And my struct is very simple it’s just

  struct Bazinga: Identifiable, Codable {
    let id: Int32
    let name: String
  }

And before I had to turn to JSON it used to be only Identifiable even. So it’s like one of the simplest possible structs, and the 70,000 items of data only a few MB when written in the source. Yet more GB of memory is needed to compile an array of these structs than I have RAM, and even exceeds the amount of disk space I have that it can swap to. It’s super weird to me that this is even a problem, and it’s insane how many GB of memory it consumes trying to compile my code.

Looks like it's solving constraints in the typechecker:

bobbylarrybobby · 5 years ago

Do you explicitly indicate the type of the array? E.G., `let array: [Bazinga] = [ … 70k elements …]` instead of `let array = [ … 70k elements …]`.

I will try giving it an explicit type when I’m on the computer again. Would be nice if that turns out to make it behave nicely.

Your story reminds me of watching a modern CAD program run out of memory and barf when trying to import a DXF file with a few thousand elements.

bogwog · 5 years ago

I don't know why you're running into that issue, but...

> It’s very sad, but it’s the only option I had because I have a ton of other code to write too for this app and cannot afford to spend the time to even make a binary format for this data.

You should look into Flatbuffers (https://google.github.io/flatbuffers/flatbuffers_guide_use_s...). It's a tool that can generate an API for reading/writing binary data based on a schema file where you design the layout (similar to protocol buffers). The data is ready to read, so you don't have to do any parsing at all, AND the compiler includes a feature to convert JSON files into binary that matches your given schema.

It won't solve your compiler woes, but it will help you avoid having to store and parse JSON, and it's a tiny dependency.

decasia · 5 years ago

It would be nice if it were more common for standard library functions to include algorithmic complexity as part of the standard documentation.

Absent that, of course we can potentially read the source code and find out, but I think for the most part we tend to operate based on an informed assumption about what we imagine the algorithmic complexity of a given operation would be. Inevitably, sometimes the assumption is incorrect.

There's no way to develop software without making assumptions, some of which inevitably turn out to be incorrect, so I don't think there is any great shame in having that happen, in itself. But better docs could help us make better assumptions with less effort, at least.

epr · 5 years ago

Keeping track of algorithmic complexity would be nice as a language and/or static analysis feature. If you wanted to be exact or do it for a language with complex metaprogramming I assume it would be a nightmare to implement. Absent those complications and especially if you always reduced it to O(1), O(n), O(log(n)), etc it might not even be that difficult given the potential advantages.

groby_b · 5 years ago

The difficulty here is "define n". And I don't mean that facetiously. You have a string parsing lib. It is, for reasons, quadratic over the number of strings parsed, and linear per string.

This is overall n^3, but that's meaningless because there actually isn't just one n. So, more m^2 * n. That means you can't reduce it to anything, because you want to keep both components. (Because, say, you know it will only ever be called with a single string).

But then, in the next app, this gets called and reinitialized once per file. And the routine handling files, for reasons beyond our ken, is (n lg n). We're now at k * log(k) * m^2 * n.

And so, over any sufficiently long call chain, "what is n" is the overriding question - string length, number of strings, number of files? Not "how complex is the algorithm", because you want to optimize for what's relevant to your use case.

adrianN · 5 years ago

There is no general solution to deciding whether a program is O(n^k).[1] So either your static analysis won't know the answer for some programs or report a wrong bound, or report a ridiculous overestimate.

[1] https://cstheory.stackexchange.com/questions/5004/are-runtim...

beaconstudios · 5 years ago

personally, I think I wouldn't even bother to check the algorithmic complexity of every external function I call. I'd just use the logical choice (like sscanf) and only consider optimising if things started to slow down and profiling the application highlighted it as a bottleneck.

I personally would, if it was listed in documentation. Doing stuff and profiling later is the right general approach to performance optimization. But what's better is not doing stupid mistakes in the first place, if they are trivial to avoid. To achieve that, you need to know the complexity guarantees of functions and data structures - or at least their ballpark (like, "this could be O(n) or perhaps O(n logn), definitely not worse").

This is where setting the guarantees and documenting them is useful - it allows people to trivially avoid making these performance mistakes. Prevention is better than cure, in that - as GTA Online case demonstrates - in the latter stage of product development, people may not bother fixing performance anymore.

Yes, I absolutely think profiling and then only optimizing the actual problems is always a sound choice.

I don't check the docs for every library function I use. I'm just saying, it wouldn't hurt if, when you do read the docs for standard library functions, the algorithmic complexity was mentioned in passing.

But you'll lose so much time doing that! Realizing there's a bug and investigating it is a huge amount of work compared to never writing it in the first place.

There are many implementations of the C standard library and thy may not have the same complexity. In this case the code has the right complexity(linear) but to the wrong n (string length instead of parse length)

AdieuToLogic · 5 years ago

> It would be nice if it were more common for standard library functions to include algorithmic complexity as part of the standard documentation.

Isn’t the point of standard library functions to define the contract and not implementation constraints? In other words, if algorithmic complexity is a significant concern, perhaps a standard library function is not a suitable choice.

Algorithmic complexity is not an implementation detail, it's part of the constraints on how you can use the function.

zachrip · 5 years ago

This is a cool idea..I'm sure it's been done before automatically in code, but I'd love to see this in my editor. It sounds similar to the bundle-size extension that you can use that will annotate how big a package that you're importing is in node.

chii · 5 years ago

> standard library functions to include algorithmic complexity as part of the standard documentation.

it would be acceptable to have a graph of the input size vs time, and this graph could be autogenerated using a testbed that is also used by unit testing! two birds one stone!

jacobajit · 5 years ago

I don't think we could come up with a standard x axis bounds for such a graph, since n=1000, or 1,000,000 may not be zoomed out enough to showcase the behavior approaching infinity.

tpetry · 5 years ago

I don‘t get the heat of this topic. Yes they wrote some very slow code because it‘s easy to shoot in your foot with scanf. It‘s nothing new that most software could be heavily optimized by just benchmarking slow parts. There is no reason for this shit storm than to feel better than other developers.

The real problem is that they shipped a game with a loading screen which is taking minutes and not looking whether they could optimize it. THAT is the real shit show!

matheusmoreira · 5 years ago

Exactly. The problem isn't the bug itself or the developers who introduced it. The problem is they simply didn't care enough about their billion dollar game to fix the problem over seven years after release until someone got mad enough to reverse engineer the game, figure out why it was so slow and fix it on their behalf.

People will always make mistakes but it's how they deal with them that matters. Gotta have enough pride in one's work to be bothered when it performs badly. Having an user fix their billion dollar project for them just because they couldn't be bothered is just embarrassing.

Yeah, but I think it puts to test the usual mantra of "write now, profile later, and optimize the bottlenecks". As reality repeatedly shows, developers often don't progress past the "write now" part, even if performance issues are crippling and annoying users to no end.

We can and should do better.

What this topic also is, is a reminder that with null-terminated strings and enough abstraction layers, you can easily make a O(n^2) or O(n^3) out of something you thought is O(n). I technically knew this (there was a popular article about it many years ago, I think by Joel Spolsky, but I can't find it now), but I didn't bother to look out for it in my current project. Thanks to this debacle, I'll do a performance review looking for "accidental quadratics", and I'm guessing many other developers will do it too. So it's a win!

milesvp · 5 years ago

Forget pride in your work, this is literally costing rockstar money. It may be hard to estimate, but I’m sure it was in the hundreds of dollars a day category at some point. Shaving milliseconds off of loading time has real effects in the web world and so at a previous job we made sure to evangelize that fact to management and did what we could to tie it to revenue as well as other KPI in order to get this kind of thing prioritized. I’m just surprised there isn’t a PM at rockstar pushing hard to reduce load times.

I think that its fairly notable that functionality, that have been arround for so long, and have been implemented so many times, is as poorly implemented as this.

Usually you can count on the C std lib to be very optimized. Many std functions like memcpy are even intrinsics in compiles, and than means they are literally faster then its possible to write in C since someone has gone in and hand optimized the assembler.

c-c-c-c-c · 5 years ago

Thing is that they didnt ship it that way. Back when it came out the loading screens were "fast". Things just grew out of proportion with the exponential increase of new items in the online mode.

And that‘s ok. But it seems no one in management approves benchmarking it now that loading is taking several minutes (!).

I am not blaming the developers, they have a lot to do every day. Maybe someone even wanted to try to fix it. But that it‘s still like this is clearly showing that management doesn‘t care and they are completely ok with a loading screen taking longer than brewing a new cup of coffee.

dom96 · 5 years ago

The loading times for GTAV have always been horrible.

zionic · 5 years ago

No they weren’t GTAV had terrible loading times at launch.

scaramanga · 5 years ago

"it will open a 97 MB binary STL file in about 165 milliseconds flat, on a 2013 Macbook Pro. This is blinding fast."

This actually sounds incredibly slow, that's nearly 1/5th of an entire second. What can it possibly be doing? :)

In case anyone else was wondering, I followed the link and clicked the description and this is actually based on the time to the first frame being rendered - not just the time to load the file (which should be essentially infinitely fast). So it's actually more impressive than it sounds.

esperent · 5 years ago

From my experience creating loaders for 3D formats (FBX, glTF, LWO) it's not loading the file that takes a long time, it's parsing the data in the file and converting it to a suitable format for rendering in OpenGL. In practice, most people use the terms "parsing" and "loading" interchangeably, or "loading" means "reading + parsing file".

There can be a lot of processing involved (looking at you FBX) or less (glTF, probably STL) but there's still going to be at least decompressing the binary data and copying it into buffers then uploading those to the GPU. So, without knowing how STL binary is specified, parsing 97mb in 165ms seems reasonable.

Const-me · 5 years ago

> What can it possibly be doing?

Possibly, two things.

1. Indexing the mesh. STL files don't contain meshes, they instead have a triangle soup. Indexed meshes are more efficient for rendering, they save VRAM bandwidth and vertex shaders.

2. Computing normals. STL files have per-triangle normals (can be complete garbage because most software ignores them), for smooth surfaces you want per-vertex normals. Computing them well (like I did there https://github.com/Const-me/Vrmac#3d-gpu-abstraction-layer ) is slow and complicated.

Touché – after all, disk-to-RAM is hundreds of MB/s, and faster if it's cached!

In practice, I'm racing mesh loading against "how long does the OS take to give you an OpenGL context", which is rarely below 160 ms (longer if it has to switch from integrated to discrete GPU).

Great data-point, I was wondering what the costs of all that would be.

How do you measure when the frame is done, are you waiting for a vsync?

titzer · 5 years ago

Do anything with a file where you actually have to touch every byte (e.g. parsing), it is pretty impressive to get anything to go faster than 100MB/s. 500MB/s is blindly fast, IMHO.

Yeah, parsing text with a state machine is slow. Parsing, say, HTTP at that speed would be impressive without SIMD. But this is a binary file full of fixed sized structures, hence my confusion.

Anyway, the answer is there, it's actually measuring the performance of sending the mesh to openGL, to the GPU, and getting a frame rendered.

fsociety · 5 years ago

How can loading a file be infinitely fast? There is latency to receive the first byte from loading the file.

When there is nothing to do. Like with an array of fixed sized structures all you need to know is how many there are and then you can increment a pointer past any number of those objects, and that pointer increment itself can probably be compiled away to nothing.

It depends on exactly what you're measuring, you see. If you are loading data in order to do something, then the "do something" part will incur 100% of all the costs in that case. So when you subtract the "do something" part to just be left with the "load it" part, you can end up with a meaningless zero-to-do/doing-it-infinitely-fast kind of a result.

So then, what would you measure? Just peak RAM throughput? You can get that from the data-sheets :)

gowld · 5 years ago

Predict your access pattern and prefetch into RAM or DMA.

tomck · 5 years ago

Is that really that slow? idk how they even read the file in that amount of time, my drive is only about 125MB/s