Lua Notes

nelua caveats
Login

nelua caveats

  1. Don't forget that this language is based most on Lua!
  2. Record equality is shallow memory block equality
  3. __hash isn't enough for records as hashmap keys
  4. require 'string' has side effects
  5. Type errors are possible with C varargs wrappers
  6. With Nelua's own printf, type errors happen at runtime
  7. The arg module needs cleaning up
  8. Forward declaration requires <forwarddecl>
  9. Your dashed-name.nelua will work until you go overboard, but keep it ASCII
  10. You can suppress C main generation with pragmas.noentrypoint
  11. The left-hand side of an assignment is evaluated first
  12. The left-hand side of an assignment is considered in type inference
  13. nilptr is falsy
  14. Take care when modifying a zero data structure
  15. The GC won't protect you from misuse of <close>
  16. The -P/--pragma and -D/--define flags take Lua code
  17. Enums can have silently have duplicate members
  18. The empty string '' is not the zero string (@string){}
  19. defer can't modify return values
  20. Is that 'ordered fields initialization' or an InitList?
  21. You won't get a useful error if you fail a bounds check
  22. coroutine.spawn() won't exhibit the implicit conversions that a function call would
  23. &self is probably a mistake
  24. __tostring must return a default-allocated string

Don't forget that this language is based most on Lua!

If you're very familiar with Lua, this hardly counts as a caveat. But if you are much more used to other languages, you can have extended debugging sessions with Nelua that are entirely due to you not noticing that some part of the language was Lua-like.

The most obvious problem is 1-based indexing when you expect 0-based indexing, but the problem that just hit me was as simple as this:

if #rowdata then self.row:push(rowdata) end

The problem? That's always true, so the push always happens, even when #rowdata is 0. I went into gdb over this. The expectation that hurt me here is not even that "only false and nil are falsy" - that's a very common design decision for newer languages. My expectation was that there are two kinds of languages:

  1. languages with strict conditionals that require an exact boolean value, either true or false, and for which anything else results in a compile-time error
  2. languages with lax conditionals where all of 0, the empty string, midnight in your local timezone are false values.

Lua is neither of those. It accepts anything in a conditional, and the conditional succeeds as long as that 'anything' isn't false or nil. Nelua follows Lua in this.

Likewise, when string processing, byte == '\n' will never be true because, yes, a string is always different from a number.

Record equality is shallow memory block equality

Is it a good idea to compare records with ==? Consider:

require 'string'

local r = @record{
  a: string,
  b: string,
}

local x: r = {'hello', 'world'}
local y: r = {'hello', 'world'}
local z: r = {'hello', string.copy(y.b)}
assert(x.b == z.b)
assert(x == y)
assert(y ~= z)

local function addrs(rec: r)
  return string.format('{%u (%u), %u (%u)}',
    (usize)(rec.a.data), rec.a.size,
    (usize)(rec.b.data), rec.b.size)
end
print('x:', addrs(x)) -- x:	{93889754644488 (5), 93889754644494 (5)}
print('y:', addrs(y)) -- y:	{93889754644488 (5), 93889754644494 (5)}
print('z:', addrs(z)) -- z:	{93889754644488 (5), 93889757111232 (5)}

__hash isn't enough for records as hashmap keys

The following program has a bug that's fixed by uncommenting the definition of r.__eq:

require 'hashmap'
require 'iterators'
require 'string'
require 'hash'

local r = @record{a: string, b: number}
function r:__hash(): usize
  print('using r:__hash()')
  return hash.hash(self.b)
end
--function r.__eq(a: r, b: r): boolean return a.b == b.b end
local map: hashmap(r, integer)

local a: r = {tostring(1), 1}
local b: r = {tostring(1), 1}
check(a:__hash() == b:__hash())
map[a] = 1
map:remove(b)
for k, _ in pairs(map) do print(k.a) end
map[b] = 1
print '---'
for k, _ in pairs(map) do print(k.a) end

Buggy output:

using r:__hash()
using r:__hash()
using r:__hash()
using r:__hash()
1
using r:__hash()
---
1
1

Correct output:

using r:__hash()
using r:__hash()
using r:__hash()
using r:__hash()
using r:__hash()
---
1

require 'string' has side effects

Consider the following:

local s1 <comptime> = 'a' .. 'b'

require 'string'

local s2 <comptime> = 'b' .. 'c'

This fails to compile at the definition of s2, as string concatenation's been defined and no longer works at compile time. Workaround: do it in Lua.

local s1 <comptime> = #['a' .. 'b']#

require 'string'

local s2 <comptime> = #['b' .. 'c']#

Type errors are possible with C varargs wrappers

Consider the following:

local function printf(format: cstring, ...): cint <cimport, nodecl, cinclude '<stdio.h>'> end

printf('%.0f\n', 1)
printf('%s %d\n', 'two', 2)

Running this produces the following output:

0
two 3

How? Let's look at the generated C code:

int nelua_main(int argc, char** argv) {
  printf((char*)"%.0f\n", 1);
  printf((char*)"%s %d\n", ((nlstring){(uint8_t*)"two", 3}), 2);
  return 0;
}

printf() is getting an integer where it expects a float, and then it's getting three parameters instead of two, with the second parameter the length of the string "two". You'll get the expected output with explicit casts:

local function printf(format: cstring, ...): cint <cimport, nodecl, cinclude '<stdio.h>'> end

printf('%.0f\n', number(1))
printf('%s %d\n', cstring('two'), cint(2))

And the -S/--sanitize flag to nelua will enable C compiler flags that will likely warn you about such errors:

warning: format ‘%f’ expects argument of type ‘double’, but argument 2 has type ‘int’ [-Wformat=]
   91 |   printf((char*)"%.0f\n", 1);
      |                  ~~~^     ~
      |                     |     |
      |                     |     int
      |                     double
      |                  %.0d
 warning: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘nlstring’ [-Wformat=]
   92 |   printf((char*)"%s %d\n", ((nlstring){(uint8_t*)"two", 3}), 2);
      |                  ~^        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                   |        |
      |                   char *   nlstring

And of course nelua stdlib doesn't have this problem:

require 'string'

print(string.format('%.0f', 1))
print(string.format('%s %d', 'two', 2))

With Nelua own printf, type errors happen at runtime

$ nelua -o see -i 'require("io") io.printf("looking at: %d\n", "a string")'
$ ./see
stringbuilder.nelua:349:12: runtime error: invalid format for argument
    assert(false, 'invalid format for argument')
           ^~~~~

The arg module needs cleaning up

The documentation provides arg's type as sequence(string, GeneralAllocator). As that's not the potentially-GC DefaultAllocator, it'll be leaked even when you're using the GC. Here are five ways of dealing with this, some with further caveats:

  1. don't. Its lifetime can end naturally with the end of the program, and get cleaned up by the OS. This has the minor downside of always showing up in valgrind's output, so it may be harder to find important leaks.
  2. clean it up on return from your main() function. Although nelua doesn't enforce anything like a main, as long as your program really ends when it exits, this should be fine. The easy way to do this is to put a defer arg:destroy() end at the top of this function.
  3. clean it up with a top-level defer, immediately after requiring it. This can be just as fine as doing it with a main() function, but I think is easier to misuse - by putting multiple requires and multiple defers in multiple files, or by requiring the toplevel itself and using functions in it that now refer to a destroyed arg:
    global arg = 'alive'
    defer arg = 'dead' end
    
    local M = @record{}
    function M.show() print(arg) end
    return M
    As used:
    $ nelua -i 'local M = require("lib") M.show()'
    dead
  4. use the C.arg library instead.
  5. use C.atexit to clean up. The same usage with this library prints 'alive':
    equire 'C.stdlib'
    
    global arg = 'alive'
    C.atexit(function() arg = 'dead' end)
    
    local M = @record{}
    function M.show() print(arg) end
    return M

Forward declaration requires <forwarddecl>

Nelua has and will probably always have, due to its metaprogramming capabilities, a need to forward declare functions that must be used before they're defined. How this is done is similar to Lua, with two additions. This is Lua:

local g
local function f() print(g()) end
function g() return 100 end
f()

And this is the equivalent Nelua:

local function g(): integer <forwarddecl> end
local function f() print(g()) end
function g() return 100 end
f()

Nelua needs to know the symbol's type and it needs <forwarddecl> to prevent that from looking like an erroneous function definition (or a complete definition, if returning void!). And in both Lua and Nelua, local is no longer needed when the function is finally defined. Or rather, local would make the deferred definition look like a redefinition, and then Nelua would complain that the forward-declared function was never defined. After all, this works in Nelua too:

local function f() return 1 end
local function g() return f() end
print(f()) -- output: 1
local function f() return 2 end
print(f(), g()) -- output: 2       1

Note that the following is completely different, despite seeming to be a different kind of forward declaration:

local g: function(): integer
local function f() print(g()) end
function g() return 100 end
f()

What this is doing is declaring `g` to be a variable with the type of a function pointer. The later definition assigns that function pointer, but is otherwise wholly unrelated. If you put local in front of it, you'll get a nice segfault at runtime as g() tries to call the function pointer at NULL.

Your dashed-name.nelua will work until you go overboard, but keep it ASCII

You may notice that Nelua compiles, requires, and seems to work just fine with filenames that definitely aren't valid C identifiers. This permissiveness, combined with the goal of very readable generated C code, results in some clear limits.

Consider:

$ cat d-a-s-h.nelua 
print(require 'd_a_s_h')
$ cat d_a_s_h.nelua 
local n = 1
return n
$ nelua d-a-s-h.nelua 
d-a-s-h.nelua:1:1: from: AST node Block
print(require 'd_a_s_h')
^~~~~~~~~~~~~~~~~~~~~~~~
d-a-s-h.nelua:1:15: error: in require: module 'd_a_s_h' cannot require itself
print(require 'd_a_s_h')
              ^~~~~~~~~

d-a-s-h.nelua's proper name is d_a_s_h. If you have some brilliant module naming system that relies on meaningful dashes and underscores, Nelua probably can't handle it. But more sensible code bases won't even notice.

But, what about non-ASCII names?

$ cat слово.nelua 
local function greppable_name() return 1 end
print(greppable_name())
$ nelua -o word.c слово.nelua && grep greppable_name word.c
static int64_t XXXXXXXXXX_greppable_name(void);
int64_t XXXXXXXXXX_greppable_name(void) {
  nelua_print_1(XXXXXXXXXX_greppable_name());

Those five Cyrillic characters are encoded with ten bytes, so the name is ten 'X'es.

You can suppress C main generation with pragmas.noentrypoint

Suppose you want to compile some Nelua to an object file to use from C:

## pragmas.noentrypoint = true
local function add(a: cint, b: cint): cint <cexport 'add'>
  return a + b
end
$ cat useadd.c
#include <stdio.h>

extern int add(int a, int b);

int main() {
	int a = 2;
	int b = 10;
	printf("add(%d, %d) = %d\n", a, b, add(a, b));
	return 0;
}

Usage:

$ nelua -o add.o add.nelua
$ gcc -Wall -o useadd useadd.c add.o
$ ./useadd
add(2, 10) = 12

Without that pragma, this error is more likely:

multiple definition of `main'; /tmp/ccZHh5j8.o:useadd.c:(.text+0x0): first defined here

The left-hand side of an assignment is evaluated first

Consider:

require 'sequence'

local function index_stability()
  local function force_realloc(seq: *sequence(integer)): integer
    for i=1, 1000000 do seq:push(i) end
    return 1234
  end
  local a: sequence(integer)
  a:push(4321)
  print(a[1])
  a[1] = force_realloc(&a)
  print(a[1])
end

index_stability()

Like D, Nim, and Odin, this prints 4321 both times. The assignment of 1234 is to the pre-reallocation location represented by a[1].

It's not a very easy hazard to fall into, but you might keep it in mind.

nilptr is falsy

Like Lua, conditionals only take two values as false: false itself, and nil.

Unlike Lua, nilptr is also false.

print('nilptr', nilptr and 'true' or 'false')
print('false',   false and 'true' or 'false')
print('nil',       nil and 'true' or 'false')
print('0',           0 and 'true' or 'false')
print("''",         '' and 'true' or 'false')

Output:

nilptr	false
false	false
nil	false
0	true
''	true

Take care when modifying a zero data structure

A sequence has reference semantics, so if you take a sequence from somewhere, and modify it, you can be sure that the original location of the sequence also sees the modifications

... unless you took the zero sequence. In this case, the sequence only exists when you modify it, and the original location still has a zero sequence. Consider:

require 'sequence'

local function add1(seq: sequence(integer))
  seq:push(1)
end
local function add2(seq: sequence(integer))
  seq:push(2)
  return seq
end
local function add3(seq: *sequence(integer))
  seq:push(3)
end

local s1: sequence(integer)
add1(s1)
assert(#s1==0)

local s2: sequence(integer)
s2 = add2(s2)
assert(#s2==1 and s2[1]==2)
add2(s2)
assert(#s2==2 and s2[1]==2)

local s3: sequence(integer)
add3(s3)
assert(#s3==1 and s3[1]==3)
add1(s3)
assert(#s3==2 and s3[2]==1)

The GC won't protect you from misuse of <close>

Consider:

require 'hashmap'
require 'string'
local map: hashmap(string, string)
do
  local s <close> = string.copy('world')
  map['hello'] = s
end
print(map['hello'])

Even without pragmas.nogc, this will result in a use-after-free. -S/--sanitize catches this easily, and you can avoid it by taking care when you start to use manual memory management techniques, and not expecting that still using the GC will protect you from errors with them.

The -P/--pragma and -D/--define flags take Lua code

Consider:

$ nelua -P foo='hello world' -i '##print(pragmas.foo)'
error: failed parsing parameter 'foo=hello world':
  define:1: syntax error near <eof>

This error doesn't mean that you can't pass in data with spaces. The flag just needs to parse as Lua.

$ nelua -P 'foo="hello world"' -i '##print(pragmas.foo)'
hello world

Enums can silently have duplicate members

Possibly another Lua-ism, as Lua behaves the same way with table definitions:

local Duration = @enum{ s = 1, m, h, d, m, us }
assert(Duration.s == 1)
assert(Duration.m == 5) -- not 2

The empty string '' is not the zero string (@string){}

Consider:

local function f() return '' end
local function g() return (@string){} end
assert(f() == '') -- ok
assert(g() == '') -- ok

Those two functions compile to:

nlstring n65_f(void) {
  return ((nlstring){(uint8_t*)"", 0});
}
nlstring n65_g(void) {
  return (nlstring){0};
}

They both return zero-length strings, but one points to a valid C string in static memory, and the other one has NULL for its pointer.

defer can't modify return values

Consider, from defer_test.nelua:

do -- issue #233
  local function f(n: integer): integer
    defer n = n + 1 end
    return n
  end

  local function g(n: integer, m: integer): (integer, integer)
    defer n = n + 1 m = m + 1 end
    return n, m
  end

  assert(f(1) == 1)
  local n, m = g(1,2)
  assert(n == 1 and m == 2)
end

What happens is that return values are assigned to a secret variable, then the defer runs and - in this case - performs useless modifications - and then the secret value is returned:

int64_t example_f(int64_t n) {   
  int64_t _ret_1 = n;
  { /* defer */                                        
    n = (n + 1);  
  }
  return _ret_1;
}

The intent is for defer to run 'after' function returns. This has the minor effect of doubling the stack size of returned values:

-- very silly example
local function f()
  local a: [usize(0.3*1024^2)]integer
  ## if pragmas.withdefer then
  defer print(2, a[0]) end
  ## end
  print(1, a[0])
  return a
end
print(f()[0])

usage:

$ nelua defsize.nelua 
1	0
0
$ nelua -Pwithdefer defsize.nelua 
Segmentation fault

Is that 'ordered fields initialization' or an InitList?

The Overview introduces ordered fields initialization with this example:

local Person = @record{
  name: string,
  age: integer
}
-- ordered fields initialization
local c = (@Person){"Eric", 21}
print(c.name, c.age)

It also initializes some arrays, but those can be understood identically to this example. There is also, in tests/sequence_test.nelua, this code:

do -- braces initializer
  local seq: sequence(integer) = {}
  assert(#seq == 0 and seq.impl == nilptr)
  seq:destroy()
  seq = {1,2,3}

What's going on there? That {1, 2, 3} assignment is clearly different. That's implemented by lib/sequence.nelua:

function sequenceT.__convert(values: an_arrayT): sequenceT <inline>
  local self: sequenceT
  self:reserve(#values)
  self.impl.size = #values
  for i:usize=1,#values do
    self.impl.data[i] = values[i-1]
  end
  return self
end

And here's something that looks very similar to ordered fields initialization:

require 'sequence'
local seq = (@sequence(integer)){1, 2}
for _, k in ipairs(seq) do print(k) end

In fact I reported this as a bug when I hit it with span(byte).

Here's an example of two structurally identical records that treat this syntax differently:

require 'sequence'

local Point1 = @record{x: integer, y: integer}

## local function make_Point2(T)
  local T: type = @#[T]#
  local Point2 = @record{x: T, y: T}

  local an_array: type = #[concept(function(x)
    if x.type:is_contiguous_of(T) then
      return true
    end
    return false, string.format("no viable conversion from '%s' to '%s'", x.type, Point2)
  end, function(node)
    if node.is_InitList and #node > 0 and not node:find_child_with_field('is_Pair') then
      return types.ArrayType(T, #node)
    end
  end)]#

  function Point2.__convert(values: an_array)
    local p: Point2
    for _, k in ipairs(values) do
      p.x = p.x + k
    end
    return p
  end

  ## return Point2
## end
local Point2: type = #[generalize(make_Point2)]#


local a = (@Point1){1, 2}
local b = (@Point2(integer)){1, 2}
print(a.x, a.y) -- 1, 2
print(b.x, b.y) -- 3, 0

So, how do you know what you're dealing with?

  1. you can always name your keys - i.e., never used Ordered Field Initialization
  2. you can follow the stdlib's convention: containers (including span) are initialized by a list of values, and everything else gets ordered fields. i.e., never create this Point2 abomination.

The left-hand side of an assignment is considered in type inference

Mostly this is easy to understand and has pleasant results. But when using the same name on both sides of an assignment, you can run into some odd conditions. For example, suppose you have some logical constants in a module that you want to be configurable, such as a CSV reader that can be hard-coded to use tabs vs. commas. In general a .csv file can have all kinds of separators so the library should be flexible, but for a particular app looking at particular data, this flexibility is pointless. So, you import the csv library, and then you specialize it:

local csv = require 'csvreader'
local csv = @csv('\t'_u8)

And you get an error:

error: in generic evaluation: symbol 'csv' is not a type
local csv = @csv('\t'_u8)
                ^~~~~~~~~

Why isn't it a type, when it clearly is a type? Why isn't it a type even if you add explicitly say that both variables have type type ... I don't get it myself. But there are two workarounds:

  1. use a different name for one of these variables.
  2. wrap the right-hand-side in parentheses:
    local csv = require 'csvreader'
    local csv = (@csv('\t'_u8))

You won't get a useful error if you fail a bounds check

Issue #242:

string.nelua:311:16: runtime error: index out of range
  check(i >= 1 and i <= s.size, 'index out of range')
               ^~~~~~~~~~~~~~~

Aborted

Because Nelua insists on minimalism and portability, this isn't readily improved by a little binary bloat (putting the checks at the callsites). But, because Nelua insists on minimalism and portability, it's very easily debugged by C tools such as gdb. Re-running the program with either -S/--sanitize or -d/--debug will provide a stacktrace.

Note that the binary-bloat solution that most languages take is not that useful as soon as you have another level of callers. Suppose you have an application with 100 calls of a library function that itself calls vector.__atindex. When a bounds check fails in the vector.nelua code, most languages would point the error at its callsite in the library function ... which still doesn't tell you, the application author, which of your 100 calls is the source of the error. Stacktraces are the most useful, here, and they're a planned feature for errorhandling.nelua

coroutine.spawn() won't exhibit the implicit conversions that a function call would

Consider:

require 'coroutine'
require 'span'

local function f(mem: span(integer))
  print(mem.data)
end

local a: []integer = {1, 2, 3, 4, 5}
f(a)
coroutine.spawn(f, a)
coroutine.spawn(f, (@span(integer))(a))

This could have output like

0x55edd6de70a0
0x4
0x55edd6de70a0

To get the intended behavior, use the explicit cast.

&self is probably a mistake

In the following program, the first address printed is of the implicit parameter to the function, which could've been equivalently written as function r.f(self: *r), and &self is a pointer to a pointer to an r, and a pointer that will be invalidated as soon as the function returns. If you meant to pass a pointer to r somewhere, self already is that.

local r = @record{}
function r:f()
  print((@pointer)(&self))  -- 0x7ffeb657a4f8
  print((@pointer)(self))   -- 0x404049
  ## print(self.type)       -- pointer(r)
  local p = &self
  ## p:resolve_type()
  ## print(p.type)          -- pointer(pointer(r))
end
local o: r
o:f()
print((@pointer)(&o))       -- 0x404049

What about when r.f takes the object directly? In the next program, it's successfully modified in the function, but the caller doesn't see the change because the function was given a copy to modify. This might be what you want.

local r = @record{n: integer}
local function modify(o: *r) o.n = o.n + 1 end
function r.f(self: r)
  modify(&self)
  print(self.n) -- 1
end
local o: r
o:f()
print(o.n)      -- 0

__tostring must return a default-allocated string

To documentation says as much:

The __tostring metamethod may be called, in this case, it must always return a new allocated string.

But it's an easy rule to violate by returning static strings and thinking the GC will take care of it:

require 'io'
local T = @record{}
function T:__tostring() return 'static string' end
io.printf('%s\n', (@T){})

This blows up as io.printf calls string:destroy() on the string it gets back from its implicit call to T:__tostring().

gc.nelua:140:20: runtime error: invalid unregister pointer
    assert(oldsize ~= 0, 'invalid unregister pointer')
                   ^~~~

Aborted (SIGABRT)

An easy fix is to call string.copy() in this case, or to instead implement __tostringview:

require 'io'
local T = @record{}
function T:__tostringview() return 'static string' end
io.printf('%s\n', (@T){})

NB. if you implement both, __tostring will be preferred.