One thing I would hope there is consensus on is that 1-based indexing is easier to learn. My son is doing Python at school, and the "first-element-is-actually-element-zero" is something that has to be dinned into them, a sure sign that it is non-intuitive. In similar vein, as adult programmers we know that the half-open range is the most useful mechanism for expressing a sequence, but again we have to explain to kids that if they want the numbers 1 to 12 (eg to do a times table program) they must type range(1,13), which at that stage of their learning just seems bizarre. Actually I could go on at length about why Python is a terrible teaching language, but I'll stop there !
It is non-intuitive, but it doesn't happen only in programming. Look at sports reporting, for example (basketball, football). When they say something happened "in the 6th minute", it happened between 05:00 and 05:59. The difficulty isn't solely with programming.
You can use the same mechanism (minutes:seconds) to explain the half-open ranges: does a range(0,30) mean from 0:00 to 30:00 or from 0:00 to 30:59? If the second half of the match starts at 30:00, does the first half really end at 30:00 or is it really 29:59:99...9?
For most basic concepts, there's always real-world examples to use so that the teaching doesn't have to be abstract.
There's another non-intuitive and also inconsistent usage of ranges that I find rather confusing: if you say that a meeting is from 9 to 10, it's one hour long. If you say that you'll be on vacation from January 20 to 21, most people seem to think that means two days.
You're supposed to start out by teaching them pointers. Once they fully understand how pointers work, you move them on to Hello World so they can write their first program. Then it will be intuitive when they see 0-based indexing. I think that's the HN consensus on teaching kids how to program.
That's an excellent idea! We should also teach them quantum physics so instead of questioning programing language flaws they would question reality. Another problem solved.
I think we are on the track to create a perfect programmers factory.
I agree, but Python doesn't have pointers. So I agree with your parent that Python isn't a good language for teaching.
Unfortunately teaching languages that are good for learning, like Scheme or Pascal, went out of fashion a long time ago. For years now universities have been teaching marketable languages. Ten years ago that was Java. Now it's Python.
That's how they used to do it in universities back in the 80s/90s. Start out with the hard stuff first in Intro to Computer Science. By the end of the course, you'd go from 400 people to about 50. Wish they'd still do that to keep numbers low.
Honestly, the problem is that we don't distinguish cardinal from ordinal numbers in programming
As an offset (which is cardinal—how far something is from a starting point) or really any cardinal use, zero is the natural starring point. For ordinals, “1st” is the natural starring point. But we don't write “1st” in programming, even when we used 1-based indexing to try to match an ordinal intuition, we just write “1”. If we distiguished ordinals and allowed both cardinal offsets and ordinal indexes to be used, I think we’d be fine.
The unintuitive step is apparently identifying the index with the cardinality of the collection from before the item arrives, not after it arrives – i.e. ‘the n-th element is the one that arrived after I had n elements’, not ‘the n-th element is the one such that I had n elements after it arrived’. The former identification results in 0-based ordinals, the latter leads to 1-based ordinals – and to some misconceptions about infinity, such as imagining an element ‘at index infinity’ in an infinite list, where no such need to exist.
"1st" is not a "natural" choice for the starting point of ordinal numbers.
If anything, it is an artificial choice, because in programming it is derived from a property of most, probably of all, human languages that have ordinal numerals.
In most, probably in all, human languages the ordinal numerals are derived from the cardinal numerals through the intermediate of the counting sequence 1, 2, 3, ...
When cardinal numbers appeared, their initial use was only to communicate how many elements are in a set, which was established by counting 1, 2, 3, ...
Later people realized that they can refer to an element of a sequence by using the number reached at that element when counting the elements of the sequence, so the ordinal numerals appeared, being derived from the cardinals by applying some modifier.
So any discussion about whether 1 is more natural than 0 as the starting index goes back to whether 1 is more natural as the starting point of the counting sequence.
All human languages have words for expressing zero as the number of elements of a set, but the speakers of ancient languages did not consider 0 as a number, mainly because it was not obtained by counting.
There was no need to count the elements of an empty set, you just looked at it and it was obvious that the number was 0.
Counting with the traditional sequence can be interpreted as looking at the sequence in front of you, pointing at the right of an element and saying how many elements are at the left of your hand, then moving your hand one position to the right.
It is equally possible to count by looking at the sequence in front of you, pointing at the left of an element and saying how many elements are at the left of your hand, then moving your hand one position to the right.
In the second variant, the counting sequence becomes 0, 1, 2, 3, ...
The human languages do not use the second variant for 2 reasons, one reason is that zero was not perceived as having the same nature as the other cardinal numbers and the other reason is that the second variant has 1 extra step, which is not needed, because when looking at a set with 1 element, it is obvious that the number of elements is 1, without counting.
So neither 0 or 1 is a more "natural" choice for starting counting, but 1 is more economical when the counting is done by humans.
When the counting is done by machines, 0 as the starting point is slightly more economical, because it can be simpler to initialize all the bits or digits of an electronic or mechanical counter to the same value for 0, than initializing them to different values, for 1.
While 1 was a more economical choice for humans counting sheep, 0 is a choice that is always slightly simpler, both for hardware and software implementations and for programmers, who are less likely to do "off by one" errors when using 0-based indices, because many index-computing formulas, especially in multi-dimensional arrays, become a little simpler.
In conclusion, the choice between 0 and 1 never has anything to do with "naturalness", but it should always be based on efficiency and simplicity.
I prefer 0, even if I have programmed for many years in 1-based languages, like Fortran & Basic, before first using the 0-based C and the other languages influenced by it.
Well, 0-based is not what we use in everyday life, and even arriving at the invention of 0 took us several millenia, so I'd say its non-intuitiveness is pretty much established...
The ancient Greeks didn't think 1 was a number; as late as ~1500CE Isidore of Seville was saying "one is the seed of number but not number". After all, if there's only one thing you don't need to think of it as a set of things, so there's no counting to do.
(I think in some quarters it was even argued that two isn't a number, but I forget the details. Compare the fact that some languages treat pairs of things differently in their syntax from larger groups.)
But it turns out that mathematically it's a really good idea to count 1 as a number and allow sets of size 1, and these days everyone does it and no one is very confused by it. (Well, maybe by the "sets of size 1" thing. Exhibit A: Perl. Exhibit B: Matlab.) And because we're now all used to thinking of 1 as a number, it's hard even to understand how anyone might have found it unintuitive.
If we all started counting from zero and taught our children to do likewise, intuitions 50 years from now might be somewhat different from what they are now.
The main argument I have against 1-based indexing is that it makes a lot of indexing calculations, which all programmers will need to learn at some point, even more confusing with all the additional +/-1. In other words, 1-based is easier only for the easy cases, but harder for the hard cases.
In some ways I think it's similar to the endianness debate: big endian looks natural at first glance, but when you see the expression for the total value, having the place value increase with the address just makes far more sense since it eliminates the extra length-dependent term and subtraction.
This might be an unpopular opinion, but I'd say it's more natural for kids to relate the word 'three' to 'third' and 'four' to 'fourth', instead of 'two' to 'third' and 'three' to 'fourth'.
I've always found the English language to simply be lacking here. In for instance Dutch there is a very clear distinction between half-open and closed intervals: "x tot y" is always half-open, "x tot en met y" is always closed. Whereas in English, "x until y" can mean both things, with all the resulting misunderstandings.
We have "to" and "through" in English to distinguish them. Unfortunately "to" isn't always as strict as I'd like...
"Until" I'd think of as being the strict "to", except something about it feels off like it couldn't just be dropped in every place to/through can be used.
I think it's unintuitive because it's taught in that unintuitive way of just saying "zero is first." I know we can't get too terribly deep into this stuff with esp. young children, but we can at least gently introduce them to the concept of offset v. ordinal, as the former is a much more intuitive way to actually
understand array indexes. Trying to liken indexes to ordinal numbers just promotes rote memorization, which is already time which would be better spent on other things.
I've come to feel that "arrays are pointers with offsets" is a largely-irrelevant justification for promoting zero-based indexes in today's programming landscape. Looking at the TIOBE top 10 programming languages for this month, 6 of 10 languages don't even offer a way to address memory with numeric pointers (and that climbs to 8/10 if you rule .NET as "pointers are technically available, but not commonly used and are somewhat discouraged").
If I grab JS/Java/Python/PHP, an array/list is a magic sequence of values with opaque under-the-hood operations, and my only interface to it is the item index. In that context, promoting idioms because they make mechanical sense in a language I'm not using doesn't seem compelling.
I think it’s even harder to wrap one’s head around negative indices in Python, and all the ways to do slicing. They should have added a reversed keyword instead of the current complexity.
My own experience disagrees with your anecdote, but I guess it's different for everyone.
I learned programming by myself when I was a teenager, and 0-indexed never was an issue.
We learn very early on as kids at school that there are 10 digit numbers, not 9. You learn to write 0 before any other characters. On every digit pad, such as phone dial, there is a "0".
If I ask you the list of digit numbers, 0 is the first element.
There is nothing counter-intuitive about that in my opinion, even for a kid.
Interesting. Python is my go to teaching language. It beat the snot out of learning programming with Java. As a newbie, what does "public static void main" mean? So what's your choice for a teaching language? Maybe Lua or Scheme? They seem to have few surprises which is good for teaching.
BASIC, for explaining the fundamentals of imperative programming. Its weakness as a "real" language is its strength as a training language.
Then Scheme, once they feel straightjacketed with BASIC; it will teach them what abstraction is and how to break code into chunks. It should be a bit of a lightbulb moment. They also get a type system.
Finally, Julia, once they get sick of the parentheses and difficulty of building "real" programs (you'll do a lot of DIY FFI to use standard libraries in Scheme). From this they will also learn array programming and the idea of a type hierarchy.
The only trouble with this curriculum is the student will be completely ruined for ever programming in C or Java. They will not have learned to tolerate the requisite level of tedium, or even programming without a REPL.
There's a huge difference between a teaching language for _kids_ and _adults_. That's pretty much the case for most of the skills you would ever teach, not just computer languages.
i have two daughters learning python right now and they did not have an issue with 0 based and i only explained it once.. So i don't think know if i'd say your experience is concensus
0-based indexing with closed intervals is better for slicing. This shouldn't be controversial. It's because you can represent a zero interval cleanly: [3,3) is an empty interval after slot 2, representing a single cell is [3,4).
This has two nice properties. One is that two slices are adjacent if the beginning and ends match, and the other, far more important, is that the length of the slice is end - start.
That's the one that really gets us something. It means you can do relatively complex offset math, without having to think about when you need to add or subtract an additional 1 to get your result.
I use Lua every day, and work with abstract syntax trees. I mess this up all. the. time.
Of course you can use closed intervals and stick with 1-based indexing. But for why you shouldn't, I'm going to Appeal To Authority: read Djikstra, and follow up with these.
> 0-based indexing with closed intervals is better for slicing. This shouldn't be controversial.
base is irrelevant to this, and you want (and show!) half-open, not closed, intervals.
> This has two nice properties. One is that two slices are adjacent if the beginning and ends match, and the other, far more important, is that the length of the slice is end - start.
Yeah, those are all properties of half-open intervals, irrespective of indexing base. It would be as true of π-based indexing as it is of 0-based.
It's related to 0-based indexing in that if you you want to take/iterate over the first `N` elements, `0:N` works with 0-based indexing + close-open, but if you had 1-based and close-open, you'd need the awkward `1:N+1`.
This is why 1-based index languages normally use closed-closed intervals, so that they can use `1:N`.
I'm a die hard Julian (Julia is 1-based), but I do a lot of pointer arithmetic in my packages internally. I've come to prefer 0-based indexing, as it really is more natural there. 0-based plus close-open intervals are also nicer for partitioning an iteration space/tiling loops, thanks to the fact the parent commented pointed out on the end of one iteration being the start of the next. This is a nice pattern for partitioning `N` into roughly block_size-sized blocks:
iters, rem = divrem(N, block_size)
start = 0
for i in [0,iters)
end = start + block_size + i < rem
# operate on [start, end)
start = end
end
But that's only slightly nicer. To translate this into 1-based indexing and closed-closed intervals, you'd just substitute the `# operate` line with
# operate on [start+1, end]
the `[0,iters)` with `[1:iters]`, and `i < rem` with `i <= rem`.
1- vs 0-based indexing is bike-shedding. A simple question we can all have opinions on that's easy to argue about, when it really doesn't matter much.
Julia uses 1-based indexing, but its pointer arithmetic is (obviously) 0-based, because pointer arithmetic != indexing. Adding 0 still adds 0, and adding 1 and `unsafe_load`ing will give me a different value than if I didn't add anything at all. (This is just reemphasizing the final point made by the blog post.)
The natural representation of intervals for doing arithmetic on is 0-based half-open.
Half-open because of the slicing properties, as noted in your posting and the grandparent posting.
0-based because of the simplification for converting between relative coordinate systems. Suppose you have one interval A represented as offsets within a larger interval B, and you'd like to know what A's coordinates are in the global coordinate system that B uses.
This is much easier to compute when everything uses 0-based coordinates.
On the first point, yep, completely misspoke, what you don't want is open intervals.
To the second point, as I said: Djikstra's argument for using 0 instead of 1 with half-open intervals is, to my taste, perfect. As I have nothing to add to it, I will simply defer.
For </<= intervals and 1-based indexing you have to write `for(i=1; i++; i <= N)`. So you lost nice property of having `upper - lower` number of iterations.
> I use Lua every day, and work with abstract syntax trees. I mess this up all. the. time.
I think this is the clincher in your argument (rather than Djikstra). I only use languages with 0-based indexing, and it seems natural to me, but I could almost be convinced that if only I used 1-based indexing regularly then I'd be fine with it too. But here you are with actual experience of using 1-based indexing regularly and you don't feel that way, which blows that idea out of the water.
A programmer “strengthens their ankle and leans forward by holding their arm out, preparing to relax the shoulder” into a bar. The bartender bursts out laughing for few sprints.
I mess it up as well, both in Lua and C. The key to success is to not calculate values by adding or subtracting them, but to use meaningful names instead. E.g. dist(a, b), holen(a, b), endidx(start, len) and so on. Forget about ever writing “1” in your code, point your finger and call operations out loud. It doesn’t eradicate errors, but at least elevates them to the problem domain level. Off by one is not exclusive to 1-based or 0-based, it is exclusive to thinking you’re smart enough to correctly shorten math expressions in your head every damn time.
(But sometimes I also feel too clever and after some thinking I’m just writing (i+l2+b+2) because it’s obvious if b already comes negative and means “from behind” in “0-based -1 is last” terms.)
True, but there are examples where 1-based indexing is easier, like returning the last element based on length. I think array[array.length] is easier to understand than array[array.length - 1].
Or the predecessor of the last element: array[array.length - 2] makes you think, whereas array[array.length - 1] is more obvious.
The fact of the matter is, for a type that can hold N distinct values, there are N + 1 different possible interval lengths. You can not represent intervals over a fixed-width type in that same type.
Did you mean open range? I've never encountered a circumstance where closed ranges are useful, though I presume they exist.
And yes, I think any typed language with a less expressive range syntax than Ada has some work to do. That still leaves open the question of the default, and I maintain that 0 <= .. < n is the correct one.
That... makes no sense? Whether you start counting a list at 0 or 1, if your language of choice supports [x,x) notation, that syntax will effect an empty interval, and [x,x+1) will effect a single cell, no matter whether your list indexes the first element as 0, or 1. The only difference is which cell [x,x+1) refers to, which is the part that keeps sparking the controversy.
Why on Earth would I want a "zero interval" slice?
It has none of the properties I want in a slice, and that a slice of literally any other length will have. In fact, it means every slice I use needs error-handling, because I might get... something that's functionally not a slice, but is still called one. If that doesn't scream "type error" at you, I don't know what would. In fact, that's precisely why 0 is a comparatively recent invention. Most list-based problems were solved just fine before then.
All these arguments are poor rationalisation for the field's inability to move on from the 8-bit era, where the loss of 1/256 of your address space mattered and compilers had bigger problems to solve than translating indices to offsets.
You want a zero-slice because it’s a simpler base case in almost any even mildly complex algorithm. Without empty lists, you need many additional branches throughout a code base to handle the “no element” case, causing more potential bugs. There’s nothing “8-bit” about cleaner code, quite the opposite.
Wikipedia says that Egyptians had a 0 as early as 1770 BC - almost 4000 years ago. If that's a "relatively recent" invention, what about computers?
0 comes up if you're doing any arithmetic at all. It's a natural and useful concept for almost anything. As long as you always can take something aways where there is something, you'll need 0. It wouldn't be very useful to make something such as a non-empty list type, to then be able to take away all elements except the last one, which must remain.
In more mathematical terms, 0 is called a neutral element (relative to the addition operation), and almost anything you might want to do (for example subtraction) requires as a consequence a neutral element.
Well, you might want a function which returns String, not Maybe(String), so that you can just concatenate, rather than handle Some(String) or None all the time.
So that's why you might want an empty String. If you have an optional element in a syntax, it can be convenient to match it with a rule using a Kleene star, and that gives you an empty Node, which would return an empty String if you request its value. And so on.
With open intervals, you have to represent that as, say, (3, 2). Which sucks.
In mathematics, both are common. For example, when working with polynomials or polynomial-like things, it is common to label coefficients starting with 0. E.g., a0 + a1 x + a2 x^2 + ...
The subscript on the coefficient then matches the power of x, letting you write the general term as ai x^i, which works out great when using capital sigma notation.
One the other hand, matrix rows and columns usually start from 1, so the elements on the top row are usually a11, a12, a13, ..., and the next row is a21, a22, a23, ..., and so on.
In an attempt to bring some unity to the two sides on this issue in programming, let me offer something that I'm sure everybody will be able to agree on.
Once upon a time I was implementing some mathematical code in C++. It was a mix of things from places where mathematicians would number from 0 and where they would number from 1.
I decided that the code would be clearer if the code looked like the formulas in the math papers they came from, which meant I wanted to use 0-based arrays in some places and 1-based in others.
Solution: I overloaded the () operator so that array(i) was a reference to array[i-1]. Then I could use 0-based or 1-based, depending on whether I was using a formula that came from a 0-based or 1-based area of mathematics.
Julia and Fortran, languages designed to be used for mathematics and high performance scientific computing, take a similar approach in the language itself and support arbitrary starting indices for arrays. Sometimes the math is just cleaner when the index starts at 0, or 1, or 2, or -1!
And of course there is C++ libraries to do the same. For example the blitz++ library. Negative indices to the left of element[0] are great for ghost zones in codes that do domain decomposition over different MPI ranks.
TBH indexing became much less important with every language adopting some sort of "for all" and map/filter/reduce constructs. If you don't care about indexes you don't need to think about them (finally!).
The remaining cases are by definition edge cases and warrant enough attention that bugs caused by 1- vs 0-based indexing doesn't seem to be a big problem in practice.
It's like with goto and structured programming - people stopped using goto for loops and ifs, so the remaining cases where goto is used as last resort aren't much of a problem. People think hard before doing this.
I'd say it's the exact opposite. Iterating over an array is trivial no matter if you use 0- or 1-based indexing. If that is all you do, the type of indexing really doesn't matter.
It is for all the more advanced uses of arrays and indexing where the base starts to actually matter, and those are not covered by foreach.
I disagree, this depends a lot on your problem space. I mainly do scientific programming and so have to do array/vector slicing etc. and indexing definitely plays a big role.
Do you often find a need to slice hardcoded numbers, however?
I find that the only hardcoded number I often need to slice or index is indeed the start, and most languages do offer something for that such as `slice[..i]` to slice from the start.
Yes. I did write Lua for a few years and have a hard time remembering seeing any indexing cases. I probably encountered some but they were very few and far between.
Exactly. When I was a beginner I made off-by-1 errors (in Python mind you) all the time. Now that I iterate directly through arrays or use map-reduce patterns I rarely have to handle the index itself.
I was hoping for an interesting analysis of 0 vs 1 based indexing, but instead got a 3 page rant of HN-this and appeal-to-authority-that which adds absolutely nothing of value to the discussion. It feels more like a bottom-of-the-page angry comment to an article than an actual article.
Yeah, this is a bunch of paragraphs complaining that arguments for 0-based aren't good enough for him, but giving no actual arguments FOR 1-based. That's not how you convince anyone. You can't just say that the other guy is wrong without giving any explanation for why you are right.
Plus, the last bit is just "imagining a guy and getting angry at that guy", the absolutely dumbest form of argumentation.
This is probably because this whole debate is pedantic and irrelevant. 0-base wins, at this point it's not about picking a fork in the road, just swimming up stream for the sake of standing out.
LUA is over 23 years old at this point, it made a arbitrary design decision and has had to die on that hill ever since.
I quite disagree, I think it makes two insightful points:
1. In this discussion, people often overlook that these numbers are applied to two different concepts: offsets and numbering.
2. If C had been designed with 0-based pointers (for offset calculation) and 1-based indices (for numbering the array elements) people would probably find this a strength, to have the most appropriate base for each case. It's an interesting what-if idea that I hadn't seen before.
If C had 0-based pointers and 1-based indices, every CPU in existence would have a "subtract 1, multiply, and add" instead of an "multiply and add" instruction and everybody would think this is stupid because "multiply and add" could be used for a lot of things, but "subtract 1, multiply, and add" has only one use.
Fully agree. Got to the end of the post and was looking for the rest. The ranting could have been fine if there was more information and exploration. As it is, it’s unremarkable and not worth reading
My opinion is that languages where you work with vector representations primarily (or at least often) like Fortran, Matlab, Julia are best with a 1 index.
They are languages primarily used for math and simulations and are pretty specialized so the indexing and representation of the vectors matches closer to how you would consider things if you were doing linear algebra by hand.
Julia, Matlab, and Fortran are also all column major for multi dimensional arrays too.
For numeric work ill take 1 index any day of the week, but for general programming or other CS work (like non vector/matrix data structures) i much prefer 0 indexing.
It was super frustrating to see the author dismiss Dykstra’s argument as an appeal to authority. No it isn’t, it just means that we find the arguments he outlined compelling. As someone who switches between C and FORTRAN daily, it’s incredibly clear that zero based indexing is easier.
Here's two arguments arguments in favor of 0-based 'indexing' (or offsets), that aren't just "because that's how it's done":
1. It is faster. Due to how memory works, 1-based languages need to subtract 1 internally every time you access an array[1].
2. It works mathematically better with some of the most common operations on array offsets, like modulo and division into round/ceil/floor. You'll be peppering your code with +1 or -1 around those a lot if you use 1-based indexing.
I am not sure (1) is a very convincing argument. The subtraction of 1 internally is not necessarily gonna be there once the code gets compiled.
I will personally concede with you on (2), but it's still not very convincing considering that Fortran (a 1-index language) is still (arguably) the most popular language for linear algebra.
Personally, I feel "The Index Wars" are the same as the "Text Editor Wars": a matter for personal opinion.
>I am not sure (1) is a very convincing argument. The subtraction of 1 internally is not necessarily gonna be there once the code gets compiled.
Theoretically one can optimize it - though only when one can statically infer the index and the compiler decides to inline the array access function. If you can't do that, the best you can hope for is a fast CPU instruction like LEA or something.
In theory one could make a lot of things fast, in practice they rarely are, and it's always better to avoid problems now than to pray later.
> It works mathematically better with some of the most common operations on array offsets
The problem is that for every formula or calculation best suited for 1-based indexing, there is another which is better implemented using 0-based indexing. Regardless of which indexing mode you are using, peppering your code with +1 or -1 is eventually inevitable. It's why I think the choice is a matter of preference and has little to do with technical reasons.
Overall I still prefer 0-based indexing - a few bytes of RAM is cheap nowadays, when the calculation can be simplified using 1-based indexing, deliberately not using [0] and pretending that the array begins at [1] is often an acceptable option (the trick was originally invented for porting Fortran code...)
Not convinced that is the case. I do not pepper my code with +1 and -1 while using 0-based indexing.
I can think of a lot of cases where I would do it for 1-based indexing, but very few for 0-based indexing. If you want to make the claim that there similarly many, you're going to have to come up with some substantial examples.
Your (1) only affects compilation time, and even this is assuming a particular implementation of arrays (which is not a given).
Your (2) I don't fully understand. Why would you need to use "modulo and division into round/ceil/floor" on array offsets? How often you do it?
For what it's worth, I've written a _ton_ of code in an obscure language called Omnimark, which is 1-based. Much of that code involved using arrays. I've just grepped through a large Omnimark codebase I helped to write, and no, it is _not_ peppered with "+1" or "-1".
No, it mostly affects execution time. Whether the effect is meaningful, however, depends on various factors.
> [...] and even this is assuming a particular implementation of arrays (which is not a given).
Eventually, all arrays boil down to pointer+offset memory accesses. Well, if your programming language internally implements "arrays" as hashmaps, then you've already lost performance-wise, of course.
> Why would you need to use "modulo and division into round/ceil/floor" on array offsets?
Whenever you're working with 2D/3D data in manually managed memory.
> How often you do it?
When you're writing web services? Probably not so much. When you're doing image manipulation or 3D graphics? All the time.
Every =single= hash map worth its salt should be power of 2 length backed array. And the hashing function (for the idx of the array) is not the division (that's slow and cannot be parallelized in the cpu) but a simple bitwise AND.
1) It also affects runtime, when addresses need to have a subtraction, and maybe a multiplication, in the offset calculation.
2) You may need this often in certain network or protocol code, for example.
Regarding the mathematical advantage: it depends on the subfield, but I think 1-based numbering is more common in mathematical notation and algorithms (linear algebra being a prominent example).
Regarding performance: some mentioned that the 1 offset can often be optimized away by the compiler. But even when this is not possible, common CPU architectures include instructions that can apply the offset at no extra cost, see https://stackoverflow.com/questions/28032685/is-there-a-perf... , so 1-based indexing is just as fast.
For 2), in Julia, there is "mod1" function which acts as modulo, with 1 being the smallest value instead of zero. It doesn't really make sense mathematically, but is very useful for looping over circular arrays.
Only one reason exists these days: it prevents bugs because people mix up order, infecting, counting, arrays semantics, implementation, pointer arithmetic
> It really shows how conditioned an entire community can be when they find the statement “given a list x, the first item in x is x[1], the second item in x is x[2]” to be unnatural.
It really shows how conditioned the mankind is that they call the "0th" ordinal number by the name "first". We have never moved past the phase where the concept of "zero" was heretic.
Here + offset makes SO much sense. I think we should move to use it in other contexts too. "0 steps from the start of an ordered list" "1 step from the start of an ordered list" etc.
At the moment we are using two different "scales" for measuring things and labeling orders. We could as well use the letters "A", "B", "C" for the latter, and it wouldn't change a thing. Hindsight is 20/20, but it's just a confusing mishap that we are using "numbers" and "numbers + 1" for the two purposes, where the second one could be expressed just as another "measurement", and we could get rid of the confusing "numbers + 1" scale.
This is a left-over from mathematics where indices are distinct from the abelian group of integers.
So the index "1" as used to refer to vector element for example, is different from the number "1" on the number line. That is, one is an element of an index set comprised of ordinal numbers, while the other is an element of the integers.
The confusing part is, that both use the same symbol when in actuality they represent different concepts. The integer "1" is just a number like √2 or -7. The ordinal "1", however, is the representation of the concept "first".
The ordinal "0" would, if used as an index, always yield the empty set as a result (see von Neumann ordinals), which wouldn't be particularly useful in a programming language.
The mapping that selects elements given an index is arbitrary and can be defined as necessary. This allows for constructs like "x[-1]" mapping to the last element of an array. Here, "1" still means the "first" element, but the sign indicates the negative direction, i.e. start from last element.
In short: there's an argument to be made for having two distinct versions. One (1-based indexing) is consistent with mathematics, while the other (0-based indexing) is consistent with low-level low-level data representation and makes no distinction between ordinals and integers.
> they call the "0th" ordinal number by the name "first". We have never moved past the phase where the concept of "zero" was heretic.
Even in maths it's pretty bad, because the Natural numbers still sometimes don't include 0, so that the 0th Natural number is 1, the 1th natural number is 2 etc. (using 1th instead of 1st to avoid the confusing nature of "first" in this discussion).
> It really shows how conditioned the mankind is that they call the "0th" ordinal number by the name "first".
Well, every component of mankind was born. At birth your "first" year of life is the "0th" year. Right now, everyone is living its "n+1"th year of life (n >= 0).
I think the influence is strong, other than natural.
how conditioned the mankind is that they call the "0th" ordinal number by the name "first"
It really isn't. How many elements does a collection have, if it only has a 0th element? Is it 0? What about the size of a collection [0th, 1st, 2nd]? Or [0th .. 100th]? In language, we use 1-based indices for collections because we enumerate the elements based on counting them. A zeroth element doesn't exist, because if we had 0 elements, the collection would be empty.
Indexing of natural collections is based on cardinality, not ordinality.
> Indexing of natural collections is based on cardinality, not ordinality.
You are exactly right. But does it make sense? I'm not sure. "if it only has a 0th element" isn't in my mind, a relevant question in the context of cardinality, since the answer is the same if it only has the "1th" or "2th" or "3th" element. (Sorry, picked up the habit of "misusing" "th" from another post in this thread.) The base case of indexing is "the start of the list" whereas the base case of cardinality is "empty collection". But should you define the concept of indexing by making an equivalence between cardinality and indexing by "cardinality of collection of elements from the start of a list to the current element (inclusive)" is, in my mind, something that doesn't _necessarily_ make sense.
I admit that it's awfully lot about choices and habits, but at least I find "unpacking" the arguments this way interesting and educational.
In general I think it's fair to say that math has lots of crazy and just plain poor choices of conventions and notations that no sane person would ever think of now; it's just very hard to change something so widespread and ingrained, especially given the extremely cryptic and dense notation that leaves no room for evolution without ambiguity (and it's already hugely ambiguous since of course you can't prevent new concepts from being necessary, so syntax gets reused pretty confusingly, and haphazardly at that).
To put it this way: maths is the legacy codebase from hell; but it's what we have.
To be consistent we should redefine a few more things, e.g. when we take the 0th element of a list we have 0 elements, when we take up to the 1st element we have 1 element, etc.
Joking, but I think there's always be a mismatch somewhere.
Disclaimer: I don't care about this argument one way or another. However I found the author missed the point (as perhaps many commenters did?)
"...nowadays, all arguments that say that indexes should be 0-based are actually arguments that offsets are 0-based, indexes are offsets, therefore indexes should be 0-based. That’s a circular argument..."
Yes, that is a circular argument. It's not the one I would use. C made the decision that indexes and pointers are the same thing. It's a logical argument and one that's not circular. If they're separate things, then yes, you end up in a circle.
For languages that are not C-like, you could argue "pick an idiom and stick with it across multiple languages" or "make programming languages easier for humans to understand" Both of those arguments are preferential arguments. Perhaps there could be a study comparing the merits of each, but I haven't seen one yet.
I like chocolate ice cream. I like making pointers and arrays as similar as possible. I like being able to collapse pointer arithmetic down. I like doing huge mallocs and then playing around with large, empty hunks of memory. Some folks don't. I get it. Code should look like the way we think about the problem. That's an ideal state we'll never reach, but it's worthy of continued discussion.
It's missing 'since indexes are 0-based, they're offsets".
Edit: if I would make this argument, I'd say the assumption that indexes are offsets is baseless. This is easily proven since there are languages where this isn't true. However I would argue the problem based on practical merits: the reason there no 1-based index op-code exists is because no dominant programming language has 1-based indexing. Which is a self-fulfilling prophesy.
I didn't want to go there, but yes. If I had called that out, it would lead to a discussion about just what is meant by the term "offset". Defined broadly enough, the point worked for the author.
In general, I try to find the most general point and rebuttal as possible. These discussions tend to dive into pedantry and semantics. I'm just trying to do my part to engage the author where they are. ymmv
You can use the same mechanism (minutes:seconds) to explain the half-open ranges: does a range(0,30) mean from 0:00 to 30:00 or from 0:00 to 30:59? If the second half of the match starts at 30:00, does the first half really end at 30:00 or is it really 29:59:99...9?
For most basic concepts, there's always real-world examples to use so that the teaching doesn't have to be abstract.
those are the same number.
I think we are on the track to create a perfect programmers factory.
you can just show them a ruler and ask them where does the first centimeter starts
Also trampolines. Kids love trampolines. From there it's a simple jump (pun intended) to something like the Y-combinator!
Unfortunately teaching languages that are good for learning, like Scheme or Pascal, went out of fashion a long time ago. For years now universities have been teaching marketable languages. Ten years ago that was Java. Now it's Python.
The unintuitive step is apparently identifying the index with the cardinality of the collection from before the item arrives, not after it arrives – i.e. ‘the n-th element is the one that arrived after I had n elements’, not ‘the n-th element is the one such that I had n elements after it arrived’. The former identification results in 0-based ordinals, the latter leads to 1-based ordinals – and to some misconceptions about infinity, such as imagining an element ‘at index infinity’ in an infinite list, where no such need to exist.
If anything, it is an artificial choice, because in programming it is derived from a property of most, probably of all, human languages that have ordinal numerals.
In most, probably in all, human languages the ordinal numerals are derived from the cardinal numerals through the intermediate of the counting sequence 1, 2, 3, ...
When cardinal numbers appeared, their initial use was only to communicate how many elements are in a set, which was established by counting 1, 2, 3, ...
Later people realized that they can refer to an element of a sequence by using the number reached at that element when counting the elements of the sequence, so the ordinal numerals appeared, being derived from the cardinals by applying some modifier.
So any discussion about whether 1 is more natural than 0 as the starting index goes back to whether 1 is more natural as the starting point of the counting sequence.
All human languages have words for expressing zero as the number of elements of a set, but the speakers of ancient languages did not consider 0 as a number, mainly because it was not obtained by counting.
There was no need to count the elements of an empty set, you just looked at it and it was obvious that the number was 0.
Counting with the traditional sequence can be interpreted as looking at the sequence in front of you, pointing at the right of an element and saying how many elements are at the left of your hand, then moving your hand one position to the right.
It is equally possible to count by looking at the sequence in front of you, pointing at the left of an element and saying how many elements are at the left of your hand, then moving your hand one position to the right.
In the second variant, the counting sequence becomes 0, 1, 2, 3, ...
The human languages do not use the second variant for 2 reasons, one reason is that zero was not perceived as having the same nature as the other cardinal numbers and the other reason is that the second variant has 1 extra step, which is not needed, because when looking at a set with 1 element, it is obvious that the number of elements is 1, without counting.
So neither 0 or 1 is a more "natural" choice for starting counting, but 1 is more economical when the counting is done by humans.
When the counting is done by machines, 0 as the starting point is slightly more economical, because it can be simpler to initialize all the bits or digits of an electronic or mechanical counter to the same value for 0, than initializing them to different values, for 1.
While 1 was a more economical choice for humans counting sheep, 0 is a choice that is always slightly simpler, both for hardware and software implementations and for programmers, who are less likely to do "off by one" errors when using 0-based indices, because many index-computing formulas, especially in multi-dimensional arrays, become a little simpler.
In conclusion, the choice between 0 and 1 never has anything to do with "naturalness", but it should always be based on efficiency and simplicity.
I prefer 0, even if I have programmed for many years in 1-based languages, like Fortran & Basic, before first using the 0-based C and the other languages influenced by it.
Well, 0-based is not what we use in everyday life, and even arriving at the invention of 0 took us several millenia, so I'd say its non-intuitiveness is pretty much established...
The ancient Greeks didn't think 1 was a number; as late as ~1500CE Isidore of Seville was saying "one is the seed of number but not number". After all, if there's only one thing you don't need to think of it as a set of things, so there's no counting to do.
(I think in some quarters it was even argued that two isn't a number, but I forget the details. Compare the fact that some languages treat pairs of things differently in their syntax from larger groups.)
But it turns out that mathematically it's a really good idea to count 1 as a number and allow sets of size 1, and these days everyone does it and no one is very confused by it. (Well, maybe by the "sets of size 1" thing. Exhibit A: Perl. Exhibit B: Matlab.) And because we're now all used to thinking of 1 as a number, it's hard even to understand how anyone might have found it unintuitive.
If we all started counting from zero and taught our children to do likewise, intuitions 50 years from now might be somewhat different from what they are now.
That's what it represents in indexing, the distance you have to travel to get to the element you want.
How many more things can I fit in that box? How many comments are in that HN page?
I'd write more, but I'm running on empty and my pen is out of ink.
In some ways I think it's similar to the endianness debate: big endian looks natural at first glance, but when you see the expression for the total value, having the place value increase with the address just makes far more sense since it eliminates the extra length-dependent term and subtraction.
Closed and right-open intervals in Ruby:
I am doing my best in teaching my kids this way but the world around has the wrong idea. We'll see how it goes...
"Until" I'd think of as being the strict "to", except something about it feels off like it couldn't just be dropped in every place to/through can be used.
If I grab JS/Java/Python/PHP, an array/list is a magic sequence of values with opaque under-the-hood operations, and my only interface to it is the item index. In that context, promoting idioms because they make mechanical sense in a language I'm not using doesn't seem compelling.
I learned programming by myself when I was a teenager, and 0-indexed never was an issue.
We learn very early on as kids at school that there are 10 digit numbers, not 9. You learn to write 0 before any other characters. On every digit pad, such as phone dial, there is a "0".
If I ask you the list of digit numbers, 0 is the first element.
There is nothing counter-intuitive about that in my opinion, even for a kid.
Then Scheme, once they feel straightjacketed with BASIC; it will teach them what abstraction is and how to break code into chunks. It should be a bit of a lightbulb moment. They also get a type system.
Finally, Julia, once they get sick of the parentheses and difficulty of building "real" programs (you'll do a lot of DIY FFI to use standard libraries in Scheme). From this they will also learn array programming and the idea of a type hierarchy.
The only trouble with this curriculum is the student will be completely ruined for ever programming in C or Java. They will not have learned to tolerate the requisite level of tedium, or even programming without a REPL.
Deleted Comment
Maybe kids don't use rulers now? Just tell them that arrays are like rulers.
Dead Comment
This has two nice properties. One is that two slices are adjacent if the beginning and ends match, and the other, far more important, is that the length of the slice is end - start.
That's the one that really gets us something. It means you can do relatively complex offset math, without having to think about when you need to add or subtract an additional 1 to get your result.
I use Lua every day, and work with abstract syntax trees. I mess this up all. the. time.
Of course you can use closed intervals and stick with 1-based indexing. But for why you shouldn't, I'm going to Appeal To Authority: read Djikstra, and follow up with these.
https://wiki.c2.com/?WhyNumberingShouldStartAtZerohttps://wiki.c2.com/?WhyNumberingShouldStartAtOnehttps://wiki.c2.com/?ZeroAndOneBasedIndexes
base is irrelevant to this, and you want (and show!) half-open, not closed, intervals.
> This has two nice properties. One is that two slices are adjacent if the beginning and ends match, and the other, far more important, is that the length of the slice is end - start.
Yeah, those are all properties of half-open intervals, irrespective of indexing base. It would be as true of π-based indexing as it is of 0-based.
This is why 1-based index languages normally use closed-closed intervals, so that they can use `1:N`.
I'm a die hard Julian (Julia is 1-based), but I do a lot of pointer arithmetic in my packages internally. I've come to prefer 0-based indexing, as it really is more natural there. 0-based plus close-open intervals are also nicer for partitioning an iteration space/tiling loops, thanks to the fact the parent commented pointed out on the end of one iteration being the start of the next. This is a nice pattern for partitioning `N` into roughly block_size-sized blocks:
But that's only slightly nicer. To translate this into 1-based indexing and closed-closed intervals, you'd just substitute the `# operate` line with the `[0,iters)` with `[1:iters]`, and `i < rem` with `i <= rem`.1- vs 0-based indexing is bike-shedding. A simple question we can all have opinions on that's easy to argue about, when it really doesn't matter much.
Julia uses 1-based indexing, but its pointer arithmetic is (obviously) 0-based, because pointer arithmetic != indexing. Adding 0 still adds 0, and adding 1 and `unsafe_load`ing will give me a different value than if I didn't add anything at all. (This is just reemphasizing the final point made by the blog post.)
Half-open because of the slicing properties, as noted in your posting and the grandparent posting.
0-based because of the simplification for converting between relative coordinate systems. Suppose you have one interval A represented as offsets within a larger interval B, and you'd like to know what A's coordinates are in the global coordinate system that B uses. This is much easier to compute when everything uses 0-based coordinates.
Here is a slightly longer discussion of that in a genomics context: https://github.com/ga4gh/ga4gh-schemas/issues/121#issuecomme... and a draft of a document I wrote up (again, in a genomics context) so as never to have to have this discussion ever again: https://github.com/jmarshall/ga4gh-schemablocks.github.io/bl...
To the second point, as I said: Djikstra's argument for using 0 instead of 1 with half-open intervals is, to my taste, perfect. As I have nothing to add to it, I will simply defer.
I think this is the clincher in your argument (rather than Djikstra). I only use languages with 0-based indexing, and it seems natural to me, but I could almost be convinced that if only I used 1-based indexing regularly then I'd be fine with it too. But here you are with actual experience of using 1-based indexing regularly and you don't feel that way, which blows that idea out of the water.
Leads me to the old joke:
There are only 2 difficult things in computing. Naming things, cache invalidation and off by one errors.
I mess it up as well, both in Lua and C. The key to success is to not calculate values by adding or subtracting them, but to use meaningful names instead. E.g. dist(a, b), holen(a, b), endidx(start, len) and so on. Forget about ever writing “1” in your code, point your finger and call operations out loud. It doesn’t eradicate errors, but at least elevates them to the problem domain level. Off by one is not exclusive to 1-based or 0-based, it is exclusive to thinking you’re smart enough to correctly shorten math expressions in your head every damn time.
(But sometimes I also feel too clever and after some thinking I’m just writing (i+l2+b+2) because it’s obvious if b already comes negative and means “from behind” in “0-based -1 is last” terms.)
Or the predecessor of the last element: array[array.length - 2] makes you think, whereas array[array.length - 1] is more obvious.
Or how about a closed range in uint64_t that ends at 0xffffffffffffffff
Or a range from roughly a = -9223372036854775808 to b = 9223372036854775807 in int64_t. b - a will overflow.
The fact of the matter is, for a type that can hold N distinct values, there are N + 1 different possible interval lengths. You can not represent intervals over a fixed-width type in that same type.
And yes, I think any typed language with a less expressive range syntax than Ada has some work to do. That still leaves open the question of the default, and I maintain that 0 <= .. < n is the correct one.
Given how confused and inconsistent Python slicing syntax is, it obviously isn't.
It has none of the properties I want in a slice, and that a slice of literally any other length will have. In fact, it means every slice I use needs error-handling, because I might get... something that's functionally not a slice, but is still called one. If that doesn't scream "type error" at you, I don't know what would. In fact, that's precisely why 0 is a comparatively recent invention. Most list-based problems were solved just fine before then.
All these arguments are poor rationalisation for the field's inability to move on from the 8-bit era, where the loss of 1/256 of your address space mattered and compilers had bigger problems to solve than translating indices to offsets.
0 comes up if you're doing any arithmetic at all. It's a natural and useful concept for almost anything. As long as you always can take something aways where there is something, you'll need 0. It wouldn't be very useful to make something such as a non-empty list type, to then be able to take away all elements except the last one, which must remain.
In more mathematical terms, 0 is called a neutral element (relative to the addition operation), and almost anything you might want to do (for example subtraction) requires as a consequence a neutral element.
So that's why you might want an empty String. If you have an optional element in a syntax, it can be convenient to match it with a rule using a Kleene star, and that gives you an empty Node, which would return an empty String if you request its value. And so on.
With open intervals, you have to represent that as, say, (3, 2). Which sucks.
The subscript on the coefficient then matches the power of x, letting you write the general term as ai x^i, which works out great when using capital sigma notation.
One the other hand, matrix rows and columns usually start from 1, so the elements on the top row are usually a11, a12, a13, ..., and the next row is a21, a22, a23, ..., and so on.
In an attempt to bring some unity to the two sides on this issue in programming, let me offer something that I'm sure everybody will be able to agree on.
Once upon a time I was implementing some mathematical code in C++. It was a mix of things from places where mathematicians would number from 0 and where they would number from 1.
I decided that the code would be clearer if the code looked like the formulas in the math papers they came from, which meant I wanted to use 0-based arrays in some places and 1-based in others.
Solution: I overloaded the () operator so that array(i) was a reference to array[i-1]. Then I could use 0-based or 1-based, depending on whether I was using a formula that came from a 0-based or 1-based area of mathematics.
Everybody agree that this was not a good idea?
https://docs.julialang.org/en/v1/devdocs/offset-arrays/
Deleted Comment
Dead Comment
The remaining cases are by definition edge cases and warrant enough attention that bugs caused by 1- vs 0-based indexing doesn't seem to be a big problem in practice.
It's like with goto and structured programming - people stopped using goto for loops and ifs, so the remaining cases where goto is used as last resort aren't much of a problem. People think hard before doing this.
It is for all the more advanced uses of arrays and indexing where the base starts to actually matter, and those are not covered by foreach.
I find that the only hardcoded number I often need to slice or index is indeed the start, and most languages do offer something for that such as `slice[..i]` to slice from the start.
Plus, the last bit is just "imagining a guy and getting angry at that guy", the absolutely dumbest form of argumentation.
Finally, someone who has never followed electoral politics or current affairs.
1. In this discussion, people often overlook that these numbers are applied to two different concepts: offsets and numbering.
2. If C had been designed with 0-based pointers (for offset calculation) and 1-based indices (for numbering the array elements) people would probably find this a strength, to have the most appropriate base for each case. It's an interesting what-if idea that I hadn't seen before.
They are languages primarily used for math and simulations and are pretty specialized so the indexing and representation of the vectors matches closer to how you would consider things if you were doing linear algebra by hand.
Julia, Matlab, and Fortran are also all column major for multi dimensional arrays too.
For numeric work ill take 1 index any day of the week, but for general programming or other CS work (like non vector/matrix data structures) i much prefer 0 indexing.
1. It is faster. Due to how memory works, 1-based languages need to subtract 1 internally every time you access an array[1].
2. It works mathematically better with some of the most common operations on array offsets, like modulo and division into round/ceil/floor. You'll be peppering your code with +1 or -1 around those a lot if you use 1-based indexing.
[1]: This is lua, for instance: https://github.com/lua/lua/blob/master/ltable.c#L702
I am not sure (1) is a very convincing argument. The subtraction of 1 internally is not necessarily gonna be there once the code gets compiled.
I will personally concede with you on (2), but it's still not very convincing considering that Fortran (a 1-index language) is still (arguably) the most popular language for linear algebra.
Personally, I feel "The Index Wars" are the same as the "Text Editor Wars": a matter for personal opinion.
Theoretically one can optimize it - though only when one can statically infer the index and the compiler decides to inline the array access function. If you can't do that, the best you can hope for is a fast CPU instruction like LEA or something.
In theory one could make a lot of things fast, in practice they rarely are, and it's always better to avoid problems now than to pray later.
The problem is that for every formula or calculation best suited for 1-based indexing, there is another which is better implemented using 0-based indexing. Regardless of which indexing mode you are using, peppering your code with +1 or -1 is eventually inevitable. It's why I think the choice is a matter of preference and has little to do with technical reasons.
Overall I still prefer 0-based indexing - a few bytes of RAM is cheap nowadays, when the calculation can be simplified using 1-based indexing, deliberately not using [0] and pretending that the array begins at [1] is often an acceptable option (the trick was originally invented for porting Fortran code...)
I can think of a lot of cases where I would do it for 1-based indexing, but very few for 0-based indexing. If you want to make the claim that there similarly many, you're going to have to come up with some substantial examples.
Your (2) I don't fully understand. Why would you need to use "modulo and division into round/ceil/floor" on array offsets? How often you do it?
For what it's worth, I've written a _ton_ of code in an obscure language called Omnimark, which is 1-based. Much of that code involved using arrays. I've just grepped through a large Omnimark codebase I helped to write, and no, it is _not_ peppered with "+1" or "-1".
No, it mostly affects execution time. Whether the effect is meaningful, however, depends on various factors.
> [...] and even this is assuming a particular implementation of arrays (which is not a given).
Eventually, all arrays boil down to pointer+offset memory accesses. Well, if your programming language internally implements "arrays" as hashmaps, then you've already lost performance-wise, of course.
> Why would you need to use "modulo and division into round/ceil/floor" on array offsets?
Whenever you're working with 2D/3D data in manually managed memory.
> How often you do it?
When you're writing web services? Probably not so much. When you're doing image manipulation or 3D graphics? All the time.
Circular buffers?
Every =single= hash map worth its salt should be power of 2 length backed array. And the hashing function (for the idx of the array) is not the division (that's slow and cannot be parallelized in the cpu) but a simple bitwise AND.
In short offset 1 doesn't map well to hardware.
Regarding performance: some mentioned that the 1 offset can often be optimized away by the compiler. But even when this is not possible, common CPU architectures include instructions that can apply the offset at no extra cost, see https://stackoverflow.com/questions/28032685/is-there-a-perf... , so 1-based indexing is just as fast.
It can be phrased mathematically like this:
Consider the equivalence relation a ≡ b (mod n). It has n equivalence classes. Use 1..n as the canonical elements for the classes.
mod1 takes a number, finds its equivalence class and returns the canonical element.
It really shows how conditioned the mankind is that they call the "0th" ordinal number by the name "first". We have never moved past the phase where the concept of "zero" was heretic.
Here + offset makes SO much sense. I think we should move to use it in other contexts too. "0 steps from the start of an ordered list" "1 step from the start of an ordered list" etc.
At the moment we are using two different "scales" for measuring things and labeling orders. We could as well use the letters "A", "B", "C" for the latter, and it wouldn't change a thing. Hindsight is 20/20, but it's just a confusing mishap that we are using "numbers" and "numbers + 1" for the two purposes, where the second one could be expressed just as another "measurement", and we could get rid of the confusing "numbers + 1" scale.
This is a left-over from mathematics where indices are distinct from the abelian group of integers.
So the index "1" as used to refer to vector element for example, is different from the number "1" on the number line. That is, one is an element of an index set comprised of ordinal numbers, while the other is an element of the integers.
The confusing part is, that both use the same symbol when in actuality they represent different concepts. The integer "1" is just a number like √2 or -7. The ordinal "1", however, is the representation of the concept "first".
The ordinal "0" would, if used as an index, always yield the empty set as a result (see von Neumann ordinals), which wouldn't be particularly useful in a programming language.
The mapping that selects elements given an index is arbitrary and can be defined as necessary. This allows for constructs like "x[-1]" mapping to the last element of an array. Here, "1" still means the "first" element, but the sign indicates the negative direction, i.e. start from last element.
In short: there's an argument to be made for having two distinct versions. One (1-based indexing) is consistent with mathematics, while the other (0-based indexing) is consistent with low-level low-level data representation and makes no distinction between ordinals and integers.
Even in maths it's pretty bad, because the Natural numbers still sometimes don't include 0, so that the 0th Natural number is 1, the 1th natural number is 2 etc. (using 1th instead of 1st to avoid the confusing nature of "first" in this discussion).
Well, every component of mankind was born. At birth your "first" year of life is the "0th" year. Right now, everyone is living its "n+1"th year of life (n >= 0).
I think the influence is strong, other than natural.
It really isn't. How many elements does a collection have, if it only has a 0th element? Is it 0? What about the size of a collection [0th, 1st, 2nd]? Or [0th .. 100th]? In language, we use 1-based indices for collections because we enumerate the elements based on counting them. A zeroth element doesn't exist, because if we had 0 elements, the collection would be empty.
Indexing of natural collections is based on cardinality, not ordinality.
You are exactly right. But does it make sense? I'm not sure. "if it only has a 0th element" isn't in my mind, a relevant question in the context of cardinality, since the answer is the same if it only has the "1th" or "2th" or "3th" element. (Sorry, picked up the habit of "misusing" "th" from another post in this thread.) The base case of indexing is "the start of the list" whereas the base case of cardinality is "empty collection". But should you define the concept of indexing by making an equivalence between cardinality and indexing by "cardinality of collection of elements from the start of a list to the current element (inclusive)" is, in my mind, something that doesn't _necessarily_ make sense.
I admit that it's awfully lot about choices and habits, but at least I find "unpacking" the arguments this way interesting and educational.
To put it this way: maths is the legacy codebase from hell; but it's what we have.
Joking, but I think there's always be a mismatch somewhere.
"...nowadays, all arguments that say that indexes should be 0-based are actually arguments that offsets are 0-based, indexes are offsets, therefore indexes should be 0-based. That’s a circular argument..."
Yes, that is a circular argument. It's not the one I would use. C made the decision that indexes and pointers are the same thing. It's a logical argument and one that's not circular. If they're separate things, then yes, you end up in a circle.
For languages that are not C-like, you could argue "pick an idiom and stick with it across multiple languages" or "make programming languages easier for humans to understand" Both of those arguments are preferential arguments. Perhaps there could be a study comparing the merits of each, but I haven't seen one yet.
I like chocolate ice cream. I like making pointers and arrays as similar as possible. I like being able to collapse pointer arithmetic down. I like doing huge mallocs and then playing around with large, empty hunks of memory. Some folks don't. I get it. Code should look like the way we think about the problem. That's an ideal state we'll never reach, but it's worthy of continued discussion.
You might dispute the premises, but I don’t see the circle. Isn’t this just “men are mortal, Socrates is a man, therefore Socrates is mortal”?
Edit: if I would make this argument, I'd say the assumption that indexes are offsets is baseless. This is easily proven since there are languages where this isn't true. However I would argue the problem based on practical merits: the reason there no 1-based index op-code exists is because no dominant programming language has 1-based indexing. Which is a self-fulfilling prophesy.
In general, I try to find the most general point and rebuttal as possible. These discussions tend to dive into pedantry and semantics. I'm just trying to do my part to engage the author where they are. ymmv