Again on 0-based vs. 1-based indexing

One thing I would hope there is consensus on is that 1-based indexing is easier to learn. My son is doing Python at school, and the "first-element-is-actually-element-zero" is something that has to be dinned into them, a sure sign that it is non-intuitive. In similar vein, as adult programmers we know that the half-open range is the most useful mechanism for expressing a sequence, but again we have to explain to kids that if they want the numbers 1 to 12 (eg to do a times table program) they must type range(1,13), which at that stage of their learning just seems bizarre. Actually I could go on at length about why Python is a terrible teaching language, but I'll stop there !

tremon · 5 years ago

It is non-intuitive, but it doesn't happen only in programming. Look at sports reporting, for example (basketball, football). When they say something happened "in the 6th minute", it happened between 05:00 and 05:59. The difficulty isn't solely with programming.

You can use the same mechanism (minutes:seconds) to explain the half-open ranges: does a range(0,30) mean from 0:00 to 30:00 or from 0:00 to 30:59? If the second half of the match starts at 30:00, does the first half really end at 30:00 or is it really 29:59:99...9?

For most basic concepts, there's always real-world examples to use so that the teaching doesn't have to be abstract.

userbinator · 5 years ago

There's another non-intuitive and also inconsistent usage of ranges that I find rather confusing: if you say that a meeting is from 9 to 10, it's one hour long. If you say that you'll be on vacation from January 20 to 21, most people seem to think that means two days.

teekert · 5 years ago

Yeah and kids are actually aged "1" in their second year.

jjaredsimpson · 5 years ago

> does the first half really end at 30:00 or is it really 29:59:99...9?

those are the same number.

thunderbong · 5 years ago

Exactly the same as 21st century meaning everything between 2000 - 2099

novaleaf · 5 years ago

24hr time is also 0th based :)

bachmeier · 5 years ago

You're supposed to start out by teaching them pointers. Once they fully understand how pointers work, you move them on to Hello World so they can write their first program. Then it will be intuitive when they see 0-based indexing. I think that's the HN consensus on teaching kids how to program.

me_me_me · 5 years ago

That's an excellent idea! We should also teach them quantum physics so instead of questioning programing language flaws they would question reality. Another problem solved.

I think we are on the track to create a perfect programmers factory.

jcelerier · 5 years ago

> You're supposed to start out by teaching them pointers

you can just show them a ruler and ask them where does the first centimeter starts

stabbles · 5 years ago

That's great advice for a kid learning python. "Kid, let me tell you about pointers in C, then you'll understand we you have to write range(1, 13)"

coldtea · 5 years ago

Monads should make the list too...

Also trampolines. Kids love trampolines. From there it's a simple jump (pun intended) to something like the Y-combinator!

globular-toast · 5 years ago

I agree, but Python doesn't have pointers. So I agree with your parent that Python isn't a good language for teaching.

Unfortunately teaching languages that are good for learning, like Scheme or Pascal, went out of fashion a long time ago. For years now universities have been teaching marketable languages. Ten years ago that was Java. Now it's Python.

erostrate · 5 years ago

I can't tell if this is a joke or not.

teamonkey · 5 years ago

Is this before or after teaching category theory?

djxfade · 5 years ago

I don't think a lot of beginner friendly languages exposes pointers to the developer

wiz21c · 5 years ago

pointer is an implementation detail. No one should have to learned that first.

psychoslave · 5 years ago

What? I thought first they should go through TAOCP in integral. They should be grateful if you allow them to use MMIX instead of MIX.

Dirlewanger · 5 years ago

That's how they used to do it in universities back in the 80s/90s. Start out with the hard stuff first in Intro to Computer Science. By the end of the course, you'd go from 400 people to about 50. Wish they'd still do that to keep numbers low.

dragonwriter · 5 years ago

Honestly, the problem is that we don't distinguish cardinal from ordinal numbers in programming As an offset (which is cardinal—how far something is from a starting point) or really any cardinal use, zero is the natural starring point. For ordinals, “1st” is the natural starring point. But we don't write “1st” in programming, even when we used 1-based indexing to try to match an ordinal intuition, we just write “1”. If we distiguished ordinals and allowed both cardinal offsets and ordinal indexes to be used, I think we’d be fine.

pwdisswordfish5 · 5 years ago

But zero is an https://en.wikipedia.org/wiki/Ordinal_number_(mathematics) .

The unintuitive step is apparently identifying the index with the cardinality of the collection from before the item arrives, not after it arrives – i.e. ‘the n-th element is the one that arrived after I had n elements’, not ‘the n-th element is the one such that I had n elements after it arrived’. The former identification results in 0-based ordinals, the latter leads to 1-based ordinals – and to some misconceptions about infinity, such as imagining an element ‘at index infinity’ in an infinite list, where no such need to exist.

nvader · 5 years ago

https://danverbraganza.com/writings/zero-based-ordinals ;)

adrian_b · 5 years ago

"1st" is not a "natural" choice for the starting point of ordinal numbers.

If anything, it is an artificial choice, because in programming it is derived from a property of most, probably of all, human languages that have ordinal numerals.

In most, probably in all, human languages the ordinal numerals are derived from the cardinal numerals through the intermediate of the counting sequence 1, 2, 3, ...

When cardinal numbers appeared, their initial use was only to communicate how many elements are in a set, which was established by counting 1, 2, 3, ...

Later people realized that they can refer to an element of a sequence by using the number reached at that element when counting the elements of the sequence, so the ordinal numerals appeared, being derived from the cardinals by applying some modifier.

So any discussion about whether 1 is more natural than 0 as the starting index goes back to whether 1 is more natural as the starting point of the counting sequence.

All human languages have words for expressing zero as the number of elements of a set, but the speakers of ancient languages did not consider 0 as a number, mainly because it was not obtained by counting.

There was no need to count the elements of an empty set, you just looked at it and it was obvious that the number was 0.

Counting with the traditional sequence can be interpreted as looking at the sequence in front of you, pointing at the right of an element and saying how many elements are at the left of your hand, then moving your hand one position to the right.

It is equally possible to count by looking at the sequence in front of you, pointing at the left of an element and saying how many elements are at the left of your hand, then moving your hand one position to the right.

In the second variant, the counting sequence becomes 0, 1, 2, 3, ...

The human languages do not use the second variant for 2 reasons, one reason is that zero was not perceived as having the same nature as the other cardinal numbers and the other reason is that the second variant has 1 extra step, which is not needed, because when looking at a set with 1 element, it is obvious that the number of elements is 1, without counting.

So neither 0 or 1 is a more "natural" choice for starting counting, but 1 is more economical when the counting is done by humans.

When the counting is done by machines, 0 as the starting point is slightly more economical, because it can be simpler to initialize all the bits or digits of an electronic or mechanical counter to the same value for 0, than initializing them to different values, for 1.

While 1 was a more economical choice for humans counting sheep, 0 is a choice that is always slightly simpler, both for hardware and software implementations and for programmers, who are less likely to do "off by one" errors when using 0-based indices, because many index-computing formulas, especially in multi-dimensional arrays, become a little simpler.

In conclusion, the choice between 0 and 1 never has anything to do with "naturalness", but it should always be based on efficiency and simplicity.

I prefer 0, even if I have programmed for many years in 1-based languages, like Fortran & Basic, before first using the 0-based C and the other languages influenced by it.

coldtea · 5 years ago

>a sure sign that it is non-intuitive

Well, 0-based is not what we use in everyday life, and even arriving at the invention of 0 took us several millenia, so I'd say its non-intuitiveness is pretty much established...

gjm11 · 5 years ago

Intuition depends on what you're used to.

The ancient Greeks didn't think 1 was a number; as late as ~1500CE Isidore of Seville was saying "one is the seed of number but not number". After all, if there's only one thing you don't need to think of it as a set of things, so there's no counting to do.

(I think in some quarters it was even argued that two isn't a number, but I forget the details. Compare the fact that some languages treat pairs of things differently in their syntax from larger groups.)

But it turns out that mathematically it's a really good idea to count 1 as a number and allow sets of size 1, and these days everyone does it and no one is very confused by it. (Well, maybe by the "sets of size 1" thing. Exhibit A: Perl. Exhibit B: Matlab.) And because we're now all used to thinking of 1 as a number, it's hard even to understand how anyone might have found it unintuitive.

If we all started counting from zero and taught our children to do likewise, intuitions 50 years from now might be somewhat different from what they are now.

gfxgirl · 5 years ago

We use it all the time. It's a measure of distance. If you haven't moved from where you started you've moved zero distance.

That's what it represents in indexing, the distance you have to travel to get to the element you want.

teekert · 5 years ago

Well, I was born aged 0, turn 1 at the start of my second year on the planet.

IncRnd · 5 years ago

0-based is absolutely what we use in everyday life.

How many more things can I fit in that box? How many comments are in that HN page?

I'd write more, but I'm running on empty and my pen is out of ink.

jhayward · 5 years ago

Really? What value does the 'minutes' digit of the first minute of an hour have?

userbinator · 5 years ago

The main argument I have against 1-based indexing is that it makes a lot of indexing calculations, which all programmers will need to learn at some point, even more confusing with all the additional +/-1. In other words, 1-based is easier only for the easy cases, but harder for the hard cases.

In some ways I think it's similar to the endianness debate: big endian looks natural at first glance, but when you see the expression for the total value, having the place value increase with the address just makes far more sense since it eliminates the extra length-dependent term and subtraction.

sergeykish · 5 years ago

First element found at offset 0. Like ruler.

Closed and right-open intervals in Ruby:

    (1..12).to_a
    #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

    (1...13).to_a
    #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

erhk · 5 years ago

I see myself losing hours on this typo

stabbles · 5 years ago

And Rust does 1..13 and 1..=12

boxed · 5 years ago

It's only easier to learn because that's how we teach our kids. The first and most basic number is not one. It's zero.

I am doing my best in teaching my kids this way but the world around has the wrong idea. We'll see how it goes...

stabbles · 5 years ago

This might be an unpopular opinion, but I'd say it's more natural for kids to relate the word 'three' to 'third' and 'four' to 'fourth', instead of 'two' to 'third' and 'three' to 'fourth'.

aden1ne · 5 years ago

I've always found the English language to simply be lacking here. In for instance Dutch there is a very clear distinction between half-open and closed intervals: "x tot y" is always half-open, "x tot en met y" is always closed. Whereas in English, "x until y" can mean both things, with all the resulting misunderstandings.

Jtsummers · 5 years ago

While it isn't used often in conversational English, Common Lisp has a nice way of describing different ranges:

  for x from 0 to 10    ;; closed, [0,10]
  for y from 0 below 10 ;; (half-open, [0,10)

"below" is unambiguous, as 10 is not below 10.

Izkata · 5 years ago

We have "to" and "through" in English to distinguish them. Unfortunately "to" isn't always as strict as I'd like...

"Until" I'd think of as being the strict "to", except something about it feels off like it couldn't just be dropped in every place to/through can be used.

sli · 5 years ago

I think it's unintuitive because it's taught in that unintuitive way of just saying "zero is first." I know we can't get too terribly deep into this stuff with esp. young children, but we can at least gently introduce them to the concept of offset v. ordinal, as the former is a much more intuitive way to actually understand array indexes. Trying to liken indexes to ordinal numbers just promotes rote memorization, which is already time which would be better spent on other things.

spiffytech · 5 years ago

I've come to feel that "arrays are pointers with offsets" is a largely-irrelevant justification for promoting zero-based indexes in today's programming landscape. Looking at the TIOBE top 10 programming languages for this month, 6 of 10 languages don't even offer a way to address memory with numeric pointers (and that climbs to 8/10 if you rule .NET as "pointers are technically available, but not commonly used and are somewhat discouraged").

If I grab JS/Java/Python/PHP, an array/list is a magic sequence of values with opaque under-the-hood operations, and my only interface to it is the item index. In that context, promoting idioms because they make mechanical sense in a language I'm not using doesn't seem compelling.

lrossi · 5 years ago

I think it’s even harder to wrap one’s head around negative indices in Python, and all the ways to do slicing. They should have added a reversed keyword instead of the current complexity.

antpls · 5 years ago

My own experience disagrees with your anecdote, but I guess it's different for everyone.

I learned programming by myself when I was a teenager, and 0-indexed never was an issue.

We learn very early on as kids at school that there are 10 digit numbers, not 9. You learn to write 0 before any other characters. On every digit pad, such as phone dial, there is a "0".

If I ask you the list of digit numbers, 0 is the first element.

There is nothing counter-intuitive about that in my opinion, even for a kid.

janderson3 · 5 years ago

Interesting. Python is my go to teaching language. It beat the snot out of learning programming with Java. As a newbie, what does "public static void main" mean? So what's your choice for a teaching language? Maybe Lua or Scheme? They seem to have few surprises which is good for teaching.

dTal · 5 years ago

BASIC, for explaining the fundamentals of imperative programming. Its weakness as a "real" language is its strength as a training language.

Then Scheme, once they feel straightjacketed with BASIC; it will teach them what abstraction is and how to break code into chunks. It should be a bit of a lightbulb moment. They also get a type system.

Finally, Julia, once they get sick of the parentheses and difficulty of building "real" programs (you'll do a lot of DIY FFI to use standard libraries in Scheme). From this they will also learn array programming and the idea of a type hierarchy.

The only trouble with this curriculum is the student will be completely ruined for ever programming in C or Java. They will not have learned to tolerate the requisite level of tedium, or even programming without a REPL.

rockostrich · 5 years ago

Kotlin or Scala if you want Java without the verbosity of Java?

Deleted Comment

kazinator · 5 years ago

> My son is doing Python at school, and the "first-element-is-actually-element-zero" is something that has to be dinned into them

Maybe kids don't use rulers now? Just tell them that arrays are like rulers.

heavenlyblue · 5 years ago

There's a huge difference between a teaching language for _kids_ and _adults_. That's pretty much the case for most of the skills you would ever teach, not just computer languages.

liveoneggs · 5 years ago

like all of the special magic strings? the bloated syntax? the impossible (for professional adults!) environmental and ecosystem issues?

_0w8t · 5 years ago

Another thing is that in C null pointer is invalid. Yet the index zero is valid. With one-based indexes this discrepancy is removed.

moocowtruck · 5 years ago

i have two daughters learning python right now and they did not have an issue with 0 based and i only explained it once.. So i don't think know if i'd say your experience is concensus

bregma · 5 years ago

People have to learn something? That's terrible.

Dead Comment

0-based indexing with closed intervals is better for slicing. This shouldn't be controversial. It's because you can represent a zero interval cleanly: [3,3) is an empty interval after slot 2, representing a single cell is [3,4).

This has two nice properties. One is that two slices are adjacent if the beginning and ends match, and the other, far more important, is that the length of the slice is end - start.

That's the one that really gets us something. It means you can do relatively complex offset math, without having to think about when you need to add or subtract an additional 1 to get your result.

I use Lua every day, and work with abstract syntax trees. I mess this up all. the. time.

Of course you can use closed intervals and stick with 1-based indexing. But for why you shouldn't, I'm going to Appeal To Authority: read Djikstra, and follow up with these.

https://wiki.c2.com/?WhyNumberingShouldStartAtZero https://wiki.c2.com/?WhyNumberingShouldStartAtOne https://wiki.c2.com/?ZeroAndOneBasedIndexes

dragonwriter · 5 years ago

> 0-based indexing with closed intervals is better for slicing. This shouldn't be controversial.

base is irrelevant to this, and you want (and show!) half-open, not closed, intervals.

> This has two nice properties. One is that two slices are adjacent if the beginning and ends match, and the other, far more important, is that the length of the slice is end - start.

Yeah, those are all properties of half-open intervals, irrespective of indexing base. It would be as true of π-based indexing as it is of 0-based.

celrod · 5 years ago

It's related to 0-based indexing in that if you you want to take/iterate over the first `N` elements, `0:N` works with 0-based indexing + close-open, but if you had 1-based and close-open, you'd need the awkward `1:N+1`.

This is why 1-based index languages normally use closed-closed intervals, so that they can use `1:N`.

I'm a die hard Julian (Julia is 1-based), but I do a lot of pointer arithmetic in my packages internally. I've come to prefer 0-based indexing, as it really is more natural there. 0-based plus close-open intervals are also nicer for partitioning an iteration space/tiling loops, thanks to the fact the parent commented pointed out on the end of one iteration being the start of the next. This is a nice pattern for partitioning `N` into roughly block_size-sized blocks:

  iters, rem = divrem(N, block_size)
  start = 0
  for i in [0,iters)
    end = start + block_size + i < rem
    # operate on [start, end)
    start = end
  end

But that's only slightly nicer. To translate this into 1-based indexing and closed-closed intervals, you'd just substitute the `# operate` line with

    # operate on [start+1, end]

the `[0,iters)` with `[1:iters]`, and `i < rem` with `i <= rem`.

1- vs 0-based indexing is bike-shedding. A simple question we can all have opinions on that's easy to argue about, when it really doesn't matter much.

Julia uses 1-based indexing, but its pointer arithmetic is (obviously) 0-based, because pointer arithmetic != indexing. Adding 0 still adds 0, and adding 1 and `unsafe_load`ing will give me a different value than if I didn't add anything at all. (This is just reemphasizing the final point made by the blog post.)

jomar · 5 years ago

The natural representation of intervals for doing arithmetic on is 0-based half-open.

Half-open because of the slicing properties, as noted in your posting and the grandparent posting.

0-based because of the simplification for converting between relative coordinate systems. Suppose you have one interval A represented as offsets within a larger interval B, and you'd like to know what A's coordinates are in the global coordinate system that B uses. This is much easier to compute when everything uses 0-based coordinates.

Here is a slightly longer discussion of that in a genomics context: https://github.com/ga4gh/ga4gh-schemas/issues/121#issuecomme... and a draft of a document I wrote up (again, in a genomics context) so as never to have to have this discussion ever again: https://github.com/jmarshall/ga4gh-schemablocks.github.io/bl...

samatman · 5 years ago

On the first point, yep, completely misspoke, what you don't want is open intervals.

To the second point, as I said: Djikstra's argument for using 0 instead of 1 with half-open intervals is, to my taste, perfect. As I have nothing to add to it, I will simply defer.

bvrmn · 5 years ago

For </<= intervals and 1-based indexing you have to write `for(i=1; i++; i <= N)`. So you lost nice property of having `upper - lower` number of iterations.

quietbritishjim · 5 years ago

> I use Lua every day, and work with abstract syntax trees. I mess this up all. the. time.

I think this is the clincher in your argument (rather than Djikstra). I only use languages with 0-based indexing, and it seems natural to me, but I could almost be convinced that if only I used 1-based indexing regularly then I'd be fine with it too. But here you are with actual experience of using 1-based indexing regularly and you don't feel that way, which blows that idea out of the water.

Lio · 5 years ago

I've messed up array indexing under both systems. I prefer 0-based but only due to familiarity if I'm being honest.

Leads me to the old joke:

There are only 2 difficult things in computing. Naming things, cache invalidation and off by one errors.

wruza · 5 years ago

A programmer “strengthens their ankle and leans forward by holding their arm out, preparing to relax the shoulder” into a bar. The bartender bursts out laughing for few sprints.

I mess it up as well, both in Lua and C. The key to success is to not calculate values by adding or subtracting them, but to use meaningful names instead. E.g. dist(a, b), holen(a, b), endidx(start, len) and so on. Forget about ever writing “1” in your code, point your finger and call operations out loud. It doesn’t eradicate errors, but at least elevates them to the problem domain level. Off by one is not exclusive to 1-based or 0-based, it is exclusive to thinking you’re smart enough to correctly shorten math expressions in your head every damn time.

(But sometimes I also feel too clever and after some thinking I’m just writing (i+l2+b+2) because it’s obvious if b already comes negative and means “from behind” in “0-based -1 is last” terms.)

WA · 5 years ago

True, but there are examples where 1-based indexing is easier, like returning the last element based on length. I think array[array.length] is easier to understand than array[array.length - 1].

Or the predecessor of the last element: array[array.length - 2] makes you think, whereas array[array.length - 1] is more obvious.

robertlagrant · 5 years ago

This is why Python has negative indices :) Then it's just my_list[-1] or my_list[-2].

stabbles · 5 years ago

Now do a closed range in type uint8_t representing [0x00, 0xff]

Or how about a closed range in uint64_t that ends at 0xffffffffffffffff

Or a range from roughly a = -9223372036854775808 to b = 9223372036854775807 in int64_t. b - a will overflow.

user-the-name · 5 years ago

How about an empty range in type uint8_t?

The fact of the matter is, for a type that can hold N distinct values, there are N + 1 different possible interval lengths. You can not represent intervals over a fixed-width type in that same type.

samatman · 5 years ago

Did you mean open range? I've never encountered a circumstance where closed ranges are useful, though I presume they exist.

And yes, I think any typed language with a less expressive range syntax than Ada has some work to do. That still leaves open the question of the default, and I maintain that 0 <= .. < n is the correct one.

TheRealPomax · 5 years ago

That... makes no sense? Whether you start counting a list at 0 or 1, if your language of choice supports [x,x) notation, that syntax will effect an empty interval, and [x,x+1) will effect a single cell, no matter whether your list indexes the first element as 0, or 1. The only difference is which cell [x,x+1) refers to, which is the part that keeps sparking the controversy.

Mikhail_K · 5 years ago

> 0-based indexing with closed intervals is better for slicing.

Given how confused and inconsistent Python slicing syntax is, it obviously isn't.

ModernMech · 5 years ago

Also given how Matlab is essentially slicing-oriented programming and is 1-based, I've always found it perfectly intuitive.

contravariant · 5 years ago

If we're being pedantic then [3,3) = [x,x) = {}

yodelshady · 5 years ago

Why on Earth would I want a "zero interval" slice?

It has none of the properties I want in a slice, and that a slice of literally any other length will have. In fact, it means every slice I use needs error-handling, because I might get... something that's functionally not a slice, but is still called one. If that doesn't scream "type error" at you, I don't know what would. In fact, that's precisely why 0 is a comparatively recent invention. Most list-based problems were solved just fine before then.

All these arguments are poor rationalisation for the field's inability to move on from the 8-bit era, where the loss of 1/256 of your address space mattered and compilers had bigger problems to solve than translating indices to offsets.

codeflo · 5 years ago

You want a zero-slice because it’s a simpler base case in almost any even mildly complex algorithm. Without empty lists, you need many additional branches throughout a code base to handle the “no element” case, causing more potential bugs. There’s nothing “8-bit” about cleaner code, quite the opposite.

jstimpfle · 5 years ago

Wikipedia says that Egyptians had a 0 as early as 1770 BC - almost 4000 years ago. If that's a "relatively recent" invention, what about computers?

0 comes up if you're doing any arithmetic at all. It's a natural and useful concept for almost anything. As long as you always can take something aways where there is something, you'll need 0. It wouldn't be very useful to make something such as a non-empty list type, to then be able to take away all elements except the last one, which must remain.

In more mathematical terms, 0 is called a neutral element (relative to the addition operation), and almost anything you might want to do (for example subtraction) requires as a consequence a neutral element.

samatman · 5 years ago

Well, you might want a function which returns String, not Maybe(String), so that you can just concatenate, rather than handle Some(String) or None all the time.

So that's why you might want an empty String. If you have an optional element in a syntax, it can be convenient to match it with a rule using a Kleene star, and that gives you an empty Node, which would return an empty String if you request its value. And so on.

With open intervals, you have to represent that as, say, (3, 2). Which sucks.

1) It also affects runtime, when addresses need to have a subtraction, and maybe a multiplication, in the offset calculation. 2) You may need this often in certain network or protocol code, for example.