Lua Notes

nelua iterators
Login
  1. iterators as such
  2. stateful iterators
  3. Nelua's stateless iterators
  4. higher order functions as iterators
  5. generic iterators
  6. coroutine iterators

iterators as such

Suppose you want to loop over the numbers from 1 to 10 inclusive, printing each. With only labels and conditional jumps you could write this, which really works:

local i = 1
::next_i::
print(i)
if i < 10 then goto bump_i end
goto done_i
::bump_i::
i = i + 1
goto next_i
::done_i::

Gradually, structured programming came along and both a) formalized common patterns of control flow like this, and b) strongly suggested that people stick to them, and throw in the garbage the much larger group of hard-to-follow and error-prone control flow that goto is also capable of. So even if you're pretending that Nelua's numeric for loop doesn't exist, you'd write code more like:

local i = 1
while i < 10 do
  print(i)
  i = i + 1
end

And if you wrote that you'd have an error, because this loop doesn't include print(10).

To avoid such errors, especially in a codebase where you could have thousands of separate loops like this, and to repeat the success of structured programming, you could write the loop logic once in a function that you could get right, and then you could call that function an 'iterator'. Languages don't really need any special support for iterators as long as programmers can stick to a convention for them. For example:

local function upto(i: *integer, limit: integer): boolean
  if $i >= limit then return false end
  $i = $i + 1
  return true
end

local i = 0
while upto(&i, 10) do
  print(i)
end

Here the convention is: create a local variable with the initial state, then call the iterator with a pointer to that variable and any other information it needs, then use the variable if the iterator returns true.

Here's an infinite iterator that lazily yields the same value forever: function wow(_: pointer) return true end

This isn't a convention that I made up, by the way. Odin does something very similar with its strings.split_iterator. A demerit with this convention is that the explicit state can be modified separately from the iterator code, resulting in much more complex logic if done intentionally, and in bugs if done unintentionally.

stateful iterators

Lua has language support for this convention: an iterator is a function that returns a closure that, when called, either returns nil or returns the next value(s) to use. This is Lua:

function upto(limit)
  local i = 0
  return function()
    if i >= limit then return nil end
    i = i + 1
    return i
  end
end

for i in upto(10) do
  print(i)
end

This convention would be hard for Nelua to mimic as it requires both closures and dynamic typing, but you can come close. This is Nelua:

local Upto = @record{
  i: integer,
  limit: integer,
}
local function upto(limit: integer): (Upto, auto)
  return {0, limit}, function(state: *Upto): (integer, boolean)
    if state.i >= state.limit then return 0, false end
    state.i = state.i + 1
    return state.i, true
  end
end

local iter, iterf = upto(10)
while true do
  local i, ok = iterf(&iter)
  if not ok then break end
  print(i)
end

Instead of a closure, a function that receives what would be closed-over state as an argument. Instead of dynamic typing, an additional boolean return value. With some compiler support the loop could look like for i in upto(10) do print(i) end and rest of it would be handled for you.

Nelua's stateless iterators

Lua also has stateless iterators that don't require closures. These are what Nelua supports, albeit with the same boolean alternative to returning nil. This is a Nelua iterator with language support:

local function upto_next(invariant: integer, control: integer): (boolean, integer) <inline>
  if control >= invariant then return false end
  return true, control+1
end
local function upto(n: integer): (auto, integer, integer) <inline>
  return upto_next, n, 0
end

for i in upto(10) do print(i) end

upto returns a function, the invariant of the iterator (here, the limit of 10, but commonly a pointer to a list or array being iterated over), and the initial control variable of the iterator. The language then calls that function for you, passing it the invariant and control, accepting the next control variable, and terminating the loop if the boolean is false.

Explicitly:

local iterf, invariant, control = upto(10)
local more: boolean
while true do
  more, control = iterf(invariant, control)
  if not more then break end
  print(control)
end

There's some risk of confusion here because the control value has a dual purpose of controlling iteration and representing the numbers wanted for this task. So here's something slightly different:

local function parity_upto_next(invariant: integer, i: integer): (boolean, integer, string) <inline>
  i = i + 1
  if i > invariant then return false end
  if i % 2 == 0 then
    return true, i, 'even'
  else
    return true, i, 'odd'
  end
end
local function parity_upto(n: integer): (auto, integer, integer) <inline>
  return parity_upto_next, n, 0
end

for _, p in parity_upto(10) do print(p) end

And another, an iterator that yields only digits from a string:

require 'string'
require 'C.ctype'

local function digits_next(inv: string, i: usize): (boolean, usize, string)
  for i=i, #inv do
    if 0 ~= C.isdigit(string.byte(inv, i)) then
      return true, i+1, string.sub(inv, i, i)
    end
  end
  return false
end

local function digits(s: string): (auto, string, usize)
  return digits_next, s, 1
end

for i, c in digits('a string 123 with some 7 digits 1 1 in it 1') do
  print(i, c)
end

higher order functions as iterators

Programming in Lua has an aside about 'true' iterators which instead are functions that take the 'body of the loop' as a function, and call it with values that would've been yielded by an iterator. 'True' iterators have the advantage of being much easier to think about and implement, and also give the iterator much more control over iteration - for example, a 'true' iterator has a very obvious point at which to free any resources (that point: return), and a very obvious place to stash additional state for the loop (that place: the stack), but a stateless iterator could be called outside of a loop, or the loop could break early, without the iterator ever being run to completion.

In a language with closures, 'true' iterators are also very easy to use.

In Nelua, the first alternative to closures is quite unpleasant:

require 'string'
require 'C.ctype'

local function digits(s: string, f: auto, state: pointer)
  for i=1, #s do
    if 0 ~= C.isdigit(string.byte(s, i)) then
      f(string.sub(s, i, i), state)
    end
  end
end

digits('a string 123 with some 7 digits 1 1 in it 1', function(c: string, _: pointer)
  print(c)
end)

local sum = 0
digits('a string 123 with some 7 digits 1 1 in it 1', function(_: string, state: pointer)
  local count = (@*integer)(state)
  $count = $count + 1
end, &sum)
print(string.format('found %d digits in string', sum))

Trying to make that neater will run you against impermissible pointer conversions, the impossibility of polymorphic anonymous functions, limits on the unpacking of varargs, etc.

How about borrowing from FP and threading state through the function, instead of expecting the function to mutate a pointer?

require 'string'
require 'C.ctype'
require 'C.stdlib'

local function digits(s: string, f: auto, state: auto)
  local stop: boolean
  for i=1, #s do
    if 0 ~= C.isdigit(string.byte(s, i)) then
      stop, state = f(string.sub(s, i, i), state)
      if stop then break end
    end
  end
  return state
end

local st = 'a string 123 with some 7 digits 1 1 in it 1'
digits(st, function(c: string, _: integer)
  print(c)
  return false, 0
end, 0)

local sum = digits(st, function(_: string, n: integer)
  return false, n + 1
end, 0)
print(string.format('found %d digits in string', sum))

local n = C.atoi(digits(st, function(digit: string, acc: string)
  if digit > '5' then return true, acc end
  return false, acc .. digit
end, ''))
assert(n == 123)

That's a little bit easier to work with.

generic iterators

Consider this iterator:

require 'span'
require 'string'

local function repeat_elems_next(sp: span(byte), i: usize): (boolean, usize, byte)
  if i < #sp-1 then return true, i+1, sp[i+1] end
  return true, 0, sp[0]
end
local function repeat_elems(sp: span(byte))
  return repeat_elems_next, sp, #sp
end

local s: span(byte) = "hello"
for i, c in repeat_elems(s) do
  print(i, c, string.format('%c', c))
end

This works as intended, and has endless output, starting

0	104	h
1	101	e
2	108	l
3	108	l
4	111	o
0	104	h
1	101	e
2	108	l
3	108	l
4	111	o
0	104	h

But it only works on span(byte), or something convertible to it. If you try to make it generic as you would in most languages with generics, you'll easily discover that returned functions - i.e., repeat_elems_next - cannot be polymorphic. Your generic iterator must return a specific function. Like so:

require 'span'
require 'string'

local function repeat_elems_next(sp: auto, i: usize): (boolean, usize, byte)
  if i < #sp-1 then return true, i+1, sp[i+1] end
  return true, 0, sp[0]
end
local function repeat_elems(sp: auto)
  local function proxy(sp: #[sp.type]#, i: usize) return repeat_elems_next(sp, i) end
  return proxy, sp, #sp
end

local limit = 3
local s: span(byte) = "abcd"
for i, b in repeat_elems(s) do
  if limit < 1 then break end
  print(i, string.format('%c', b))
  limit = limit - 1
end

limit = 3
local s: span(integer) = {1, 2, 3, 4}
for i, n in repeat_elems(s) do
  if limit < 1 then break end
  print(i, n)
  limit = limit - 1
end

coroutine iterators

Instead of an iterator to generate values, you could have a coroutine generate the values. This enormously simplifies the 'generating' logic, while also allowing communication back into the generator, but the loop consuming the iterator is no longer as simple.

require 'coroutine'
require 'utf8'

local function runes(s: string)
  for _, r in utf8.codes(s) do
    coroutine.yield(r)
  end
end

local co = coroutine.spawn(runes, 'сети')
defer coroutine.destroy(co) end
repeat
  local r: int32
  coroutine.pop(co, &r)
  local s <close> = utf8.char(r)
  print(s)
  coroutine.resume(co)
until coroutine.status(co) ~= 'suspended'

At least, without some help:

require 'coroutine'
require 'utf8'

local function generator(f: function(string):void, arg: string)
  local co = coroutine.spawn(f, arg)
  return function(co: coroutine, i: integer)
    if coroutine.status(co) ~= 'suspended' then
      coroutine.destroy(co)
      return false
    end
    local r: int32
    coroutine.pop(co, &r)
    coroutine.resume(co)
    return true, i+1, r
  end, co, 0
end

local function runes(s: string)
  for _, r: int32 in utf8.codes(s) do
    coroutine.yield(r)
  end
end

for i, r in generator(runes, 'сети') do
  local s <close> = utf8.char(r)
  print(i, s)
end