nelua caveats
- Don't forget that this language is based most on Lua!
- Record equality is shallow memory block equality
__hashisn't enough for records as hashmap keysrequire 'string'has side effects- Type errors are possible with C varargs wrappers
- With Nelua's own printf, type errors happen at runtime
- The
argmodule needs cleaning up - Forward declaration requires
<forwarddecl> - Your dashed-name.nelua will work until you go overboard, but keep it ASCII
- You can suppress C main generation with
pragmas.noentrypoint - The left-hand side of an assignment is evaluated first
- The left-hand side of an assignment is considered in type inference
nilptris falsy- Take care when modifying a zero data structure
- The GC won't protect you from misuse of
<close> - The
-P/--pragmaand-D/--defineflags take Lua code - Enums can have silently have duplicate members
- The empty string
''is not the zero string(@string){} defercan't modify return values- Is that 'ordered fields initialization' or an InitList?
- You won't get a useful error if you fail a bounds check
coroutine.spawn()won't exhibit the implicit conversions that a function call would&selfis probably a mistake__tostringmust return a default-allocated string
Don't forget that this language is based most on Lua!
If you're very familiar with Lua, this hardly counts as a caveat. But if you are much more used to other languages, you can have extended debugging sessions with Nelua that are entirely due to you not noticing that some part of the language was Lua-like.
The most obvious problem is 1-based indexing when you expect 0-based indexing, but the problem that just hit me was as simple as this:
if #rowdata then self.row:push(rowdata) end
The problem? That's always true, so the push always happens, even when #rowdata is 0. I went into gdb over this. The expectation that hurt me here is not even that "only false and nil are falsy" - that's a very common design decision for newer languages. My expectation was that there are two kinds of languages:
- languages with strict conditionals that require an exact boolean value, either true or false, and for which anything else results in a compile-time error
- languages with lax conditionals where all of 0, the empty string, midnight in your local timezone are false values.
Lua is neither of those. It accepts anything in a conditional, and the conditional succeeds as long as that 'anything' isn't false or nil. Nelua follows Lua in this.
Likewise, when string processing, byte == '\n' will never be true because, yes, a string is always different from a number.
Record equality is shallow memory block equality
Is it a good idea to compare records with ==? Consider:
require 'string'
local r = @record{
a: string,
b: string,
}
local x: r = {'hello', 'world'}
local y: r = {'hello', 'world'}
local z: r = {'hello', string.copy(y.b)}
assert(x.b == z.b)
assert(x == y)
assert(y ~= z)
local function addrs(rec: r)
return string.format('{%u (%u), %u (%u)}',
(usize)(rec.a.data), rec.a.size,
(usize)(rec.b.data), rec.b.size)
end
print('x:', addrs(x)) -- x: {93889754644488 (5), 93889754644494 (5)}
print('y:', addrs(y)) -- y: {93889754644488 (5), 93889754644494 (5)}
print('z:', addrs(z)) -- z: {93889754644488 (5), 93889757111232 (5)}__hash isn't enough for records as hashmap keys
The following program has a bug that's fixed by uncommenting the definition of r.__eq:
require 'hashmap'
require 'iterators'
require 'string'
require 'hash'
local r = @record{a: string, b: number}
function r:__hash(): usize
print('using r:__hash()')
return hash.hash(self.b)
end
--function r.__eq(a: r, b: r): boolean return a.b == b.b end
local map: hashmap(r, integer)
local a: r = {tostring(1), 1}
local b: r = {tostring(1), 1}
check(a:__hash() == b:__hash())
map[a] = 1
map:remove(b)
for k, _ in pairs(map) do print(k.a) end
map[b] = 1
print '---'
for k, _ in pairs(map) do print(k.a) endBuggy output:
using r:__hash() using r:__hash() using r:__hash() using r:__hash() 1 using r:__hash() --- 1 1
Correct output:
using r:__hash() using r:__hash() using r:__hash() using r:__hash() using r:__hash() --- 1
require 'string' has side effects
Consider the following:
local s1 <comptime> = 'a' .. 'b' require 'string' local s2 <comptime> = 'b' .. 'c'
This fails to compile at the definition of s2, as string concatenation's been defined and no longer works at compile time. Workaround: do it in Lua.
local s1 <comptime> = #['a' .. 'b']# require 'string' local s2 <comptime> = #['b' .. 'c']#
Type errors are possible with C varargs wrappers
Consider the following:
local function printf(format: cstring, ...): cint <cimport, nodecl, cinclude '<stdio.h>'> end
printf('%.0f\n', 1)
printf('%s %d\n', 'two', 2)Running this produces the following output:
0 two 3
How? Let's look at the generated C code:
int nelua_main(int argc, char** argv) {
printf((char*)"%.0f\n", 1);
printf((char*)"%s %d\n", ((nlstring){(uint8_t*)"two", 3}), 2);
return 0;
}printf() is getting an integer where it expects a float, and then it's getting three parameters instead of two, with the second parameter the length of the string "two". You'll get the expected output with explicit casts:
local function printf(format: cstring, ...): cint <cimport, nodecl, cinclude '<stdio.h>'> end
printf('%.0f\n', number(1))
printf('%s %d\n', cstring('two'), cint(2))And the -S/--sanitize flag to nelua will enable C compiler flags that will likely warn you about such errors:
warning: format ‘%f’ expects argument of type ‘double’, but argument 2 has type ‘int’ [-Wformat=]
91 | printf((char*)"%.0f\n", 1);
| ~~~^ ~
| | |
| | int
| double
| %.0d
warning: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘nlstring’ [-Wformat=]
92 | printf((char*)"%s %d\n", ((nlstring){(uint8_t*)"two", 3}), 2);
| ~^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| char * nlstring
And of course nelua stdlib doesn't have this problem:
require 'string'
print(string.format('%.0f', 1))
print(string.format('%s %d', 'two', 2))With Nelua own printf, type errors happen at runtime
$ nelua -o see -i 'require("io") io.printf("looking at: %d\n", "a string")'
$ ./see
stringbuilder.nelua:349:12: runtime error: invalid format for argument
assert(false, 'invalid format for argument')
^~~~~
The arg module needs cleaning up
The documentation provides arg's type as sequence(string, GeneralAllocator). As that's not the potentially-GC DefaultAllocator, it'll be leaked even when you're using the GC. Here are five ways of dealing with this, some with further caveats:
- don't. Its lifetime can end naturally with the end of the program, and get cleaned up by the OS. This has the minor downside of always showing up in valgrind's output, so it may be harder to find important leaks.
- clean it up on return from your
main()function. Although nelua doesn't enforce anything like a main, as long as your program really ends when it exits, this should be fine. The easy way to do this is to put adefer arg:destroy() endat the top of this function. - clean it up with a top-level
defer, immediately after requiring it. This can be just as fine as doing it with a main() function, but I think is easier to misuse - by putting multiple requires and multiple defers in multiple files, or by requiring the toplevel itself and using functions in it that now refer to a destroyedarg:As used:global arg = 'alive' defer arg = 'dead' end local M = @record{} function M.show() print(arg) end return M$ nelua -i 'local M = require("lib") M.show()' dead - use the
C.arglibrary instead. - use
C.atexitto clean up. The same usage with this library prints 'alive':equire 'C.stdlib' global arg = 'alive' C.atexit(function() arg = 'dead' end) local M = @record{} function M.show() print(arg) end return M
Forward declaration requires <forwarddecl>
Nelua has and will probably always have, due to its metaprogramming capabilities, a need to forward declare functions that must be used before they're defined. How this is done is similar to Lua, with two additions. This is Lua:
local g local function f() print(g()) end function g() return 100 end f()
And this is the equivalent Nelua:
local function g(): integer <forwarddecl> end local function f() print(g()) end function g() return 100 end f()
Nelua needs to know the symbol's type and it needs <forwarddecl> to prevent that from looking like an erroneous function definition (or a complete definition, if returning void!). And in both Lua and Nelua, local is no longer needed when the function is finally defined. Or rather, local would make the deferred definition look like a redefinition, and then Nelua would complain that the forward-declared function was never defined. After all, this works in Nelua too:
local function f() return 1 end local function g() return f() end print(f()) -- output: 1 local function f() return 2 end print(f(), g()) -- output: 2 1
Note that the following is completely different, despite seeming to be a different kind of forward declaration:
local g: function(): integer local function f() print(g()) end function g() return 100 end f()
What this is doing is declaring `g` to be a variable with the type of a function pointer. The later definition assigns that function pointer, but is otherwise wholly unrelated. If you put local in front of it, you'll get a nice segfault at runtime as g() tries to call the function pointer at NULL.
Your dashed-name.nelua will work until you go overboard, but keep it ASCII
You may notice that Nelua compiles, requires, and seems to work just fine with filenames that definitely aren't valid C identifiers. This permissiveness, combined with the goal of very readable generated C code, results in some clear limits.
Consider:
$ cat d-a-s-h.nelua
print(require 'd_a_s_h')
$ cat d_a_s_h.nelua
local n = 1
return n
$ nelua d-a-s-h.nelua
d-a-s-h.nelua:1:1: from: AST node Block
print(require 'd_a_s_h')
^~~~~~~~~~~~~~~~~~~~~~~~
d-a-s-h.nelua:1:15: error: in require: module 'd_a_s_h' cannot require itself
print(require 'd_a_s_h')
^~~~~~~~~
d-a-s-h.nelua's proper name is d_a_s_h. If you have some brilliant module naming system that relies on meaningful dashes and underscores, Nelua probably can't handle it. But more sensible code bases won't even notice.
But, what about non-ASCII names?
$ cat слово.nelua
local function greppable_name() return 1 end
print(greppable_name())
$ nelua -o word.c слово.nelua && grep greppable_name word.c
static int64_t XXXXXXXXXX_greppable_name(void);
int64_t XXXXXXXXXX_greppable_name(void) {
nelua_print_1(XXXXXXXXXX_greppable_name());
Those five Cyrillic characters are encoded with ten bytes, so the name is ten 'X'es.
You can suppress C main generation with pragmas.noentrypoint
Suppose you want to compile some Nelua to an object file to use from C:
## pragmas.noentrypoint = true local function add(a: cint, b: cint): cint <cexport 'add'> return a + b end
$ cat useadd.c
#include <stdio.h>
extern int add(int a, int b);
int main() {
int a = 2;
int b = 10;
printf("add(%d, %d) = %d\n", a, b, add(a, b));
return 0;
}Usage:
$ nelua -o add.o add.nelua $ gcc -Wall -o useadd useadd.c add.o $ ./useadd add(2, 10) = 12
Without that pragma, this error is more likely:
multiple definition of `main'; /tmp/ccZHh5j8.o:useadd.c:(.text+0x0): first defined here
The left-hand side of an assignment is evaluated first
Consider:
require 'sequence'
local function index_stability()
local function force_realloc(seq: *sequence(integer)): integer
for i=1, 1000000 do seq:push(i) end
return 1234
end
local a: sequence(integer)
a:push(4321)
print(a[1])
a[1] = force_realloc(&a)
print(a[1])
end
index_stability()Like D, Nim, and Odin, this prints 4321 both times. The assignment of 1234 is to the pre-reallocation location represented by a[1].
It's not a very easy hazard to fall into, but you might keep it in mind.
nilptr is falsy
Like Lua, conditionals only take two values as false: false itself, and nil.
Unlike Lua, nilptr is also false.
print('nilptr', nilptr and 'true' or 'false')
print('false', false and 'true' or 'false')
print('nil', nil and 'true' or 'false')
print('0', 0 and 'true' or 'false')
print("''", '' and 'true' or 'false')Output:
nilptr false false false nil false 0 true '' true
Take care when modifying a zero data structure
A sequence has reference semantics, so if you take a sequence from somewhere, and modify it, you can be sure that the original location of the sequence also sees the modifications
... unless you took the zero sequence. In this case, the sequence only exists when you modify it, and the original location still has a zero sequence. Consider:
require 'sequence' local function add1(seq: sequence(integer)) seq:push(1) end local function add2(seq: sequence(integer)) seq:push(2) return seq end local function add3(seq: *sequence(integer)) seq:push(3) end local s1: sequence(integer) add1(s1) assert(#s1==0) local s2: sequence(integer) s2 = add2(s2) assert(#s2==1 and s2[1]==2) add2(s2) assert(#s2==2 and s2[1]==2) local s3: sequence(integer) add3(s3) assert(#s3==1 and s3[1]==3) add1(s3) assert(#s3==2 and s3[2]==1)
The GC won't protect you from misuse of <close>
Consider:
require 'hashmap'
require 'string'
local map: hashmap(string, string)
do
local s <close> = string.copy('world')
map['hello'] = s
end
print(map['hello'])Even without pragmas.nogc, this will result in a use-after-free. -S/--sanitize catches this easily, and you can avoid it by taking care when you start to use manual memory management techniques, and not expecting that still using the GC will protect you from errors with them.
The -P/--pragma and -D/--define flags take Lua code
Consider:
$ nelua -P foo='hello world' -i '##print(pragmas.foo)' error: failed parsing parameter 'foo=hello world': define:1: syntax error near <eof>
This error doesn't mean that you can't pass in data with spaces. The flag just needs to parse as Lua.
$ nelua -P 'foo="hello world"' -i '##print(pragmas.foo)' hello world
Enums can silently have duplicate members
Possibly another Lua-ism, as Lua behaves the same way with table definitions:
local Duration = @enum{ s = 1, m, h, d, m, us }
assert(Duration.s == 1)
assert(Duration.m == 5) -- not 2The empty string '' is not the zero string (@string){}
Consider:
local function f() return '' end
local function g() return (@string){} end
assert(f() == '') -- ok
assert(g() == '') -- okThose two functions compile to:
nlstring n65_f(void) {
return ((nlstring){(uint8_t*)"", 0});
}
nlstring n65_g(void) {
return (nlstring){0};
}They both return zero-length strings, but one points to a valid C string in static memory, and the other one has NULL for its pointer.
defer can't modify return values
Consider, from defer_test.nelua:
do -- issue #233
local function f(n: integer): integer
defer n = n + 1 end
return n
end
local function g(n: integer, m: integer): (integer, integer)
defer n = n + 1 m = m + 1 end
return n, m
end
assert(f(1) == 1)
local n, m = g(1,2)
assert(n == 1 and m == 2)
endWhat happens is that return values are assigned to a secret variable, then the defer runs and - in this case - performs useless modifications - and then the secret value is returned:
int64_t example_f(int64_t n) {
int64_t _ret_1 = n;
{ /* defer */
n = (n + 1);
}
return _ret_1;
}The intent is for defer to run 'after' function returns. This has the minor effect of doubling the stack size of returned values:
-- very silly example local function f() local a: [usize(0.3*1024^2)]integer ## if pragmas.withdefer then defer print(2, a[0]) end ## end print(1, a[0]) return a end print(f()[0])
usage:
$ nelua defsize.nelua 1 0 0 $ nelua -Pwithdefer defsize.nelua Segmentation fault
Is that 'ordered fields initialization' or an InitList?
The Overview introduces ordered fields initialization with this example:
local Person = @record{
name: string,
age: integer
}
-- ordered fields initialization
local c = (@Person){"Eric", 21}
print(c.name, c.age)It also initializes some arrays, but those can be understood identically to this example. There is also, in tests/sequence_test.nelua, this code:
do -- braces initializer
local seq: sequence(integer) = {}
assert(#seq == 0 and seq.impl == nilptr)
seq:destroy()
seq = {1,2,3}What's going on there? That {1, 2, 3} assignment is clearly different. That's implemented by lib/sequence.nelua:
function sequenceT.__convert(values: an_arrayT): sequenceT <inline>
local self: sequenceT
self:reserve(#values)
self.impl.size = #values
for i:usize=1,#values do
self.impl.data[i] = values[i-1]
end
return self
end
And here's something that looks very similar to ordered fields initialization:
require 'sequence'
local seq = (@sequence(integer)){1, 2}
for _, k in ipairs(seq) do print(k) endIn fact I reported this as a bug when I hit it with span(byte).
Here's an example of two structurally identical records that treat this syntax differently:
require 'sequence'
local Point1 = @record{x: integer, y: integer}
## local function make_Point2(T)
local T: type = @#[T]#
local Point2 = @record{x: T, y: T}
local an_array: type = #[concept(function(x)
if x.type:is_contiguous_of(T) then
return true
end
return false, string.format("no viable conversion from '%s' to '%s'", x.type, Point2)
end, function(node)
if node.is_InitList and #node > 0 and not node:find_child_with_field('is_Pair') then
return types.ArrayType(T, #node)
end
end)]#
function Point2.__convert(values: an_array)
local p: Point2
for _, k in ipairs(values) do
p.x = p.x + k
end
return p
end
## return Point2
## end
local Point2: type = #[generalize(make_Point2)]#
local a = (@Point1){1, 2}
local b = (@Point2(integer)){1, 2}
print(a.x, a.y) -- 1, 2
print(b.x, b.y) -- 3, 0So, how do you know what you're dealing with?
- you can always name your keys - i.e., never used Ordered Field Initialization
- you can follow the stdlib's convention: containers (including
span) are initialized by a list of values, and everything else gets ordered fields. i.e., never create this Point2 abomination.
The left-hand side of an assignment is considered in type inference
Mostly this is easy to understand and has pleasant results. But when using the same name on both sides of an assignment, you can run into some odd conditions. For example, suppose you have some logical constants in a module that you want to be configurable, such as a CSV reader that can be hard-coded to use tabs vs. commas. In general a .csv file can have all kinds of separators so the library should be flexible, but for a particular app looking at particular data, this flexibility is pointless. So, you import the csv library, and then you specialize it:
local csv = require 'csvreader'
local csv = @csv('\t'_u8)And you get an error:
error: in generic evaluation: symbol 'csv' is not a type
local csv = @csv('\t'_u8)
^~~~~~~~~
Why isn't it a type, when it clearly is a type? Why isn't it a type even if you add explicitly say that both variables have type type ... I don't get it myself. But there are two workarounds:
- use a different name for one of these variables.
- wrap the right-hand-side in parentheses:
local csv = require 'csvreader' local csv = (@csv('\t'_u8))
You won't get a useful error if you fail a bounds check
Issue #242:
string.nelua:311:16: runtime error: index out of range
check(i >= 1 and i <= s.size, 'index out of range')
^~~~~~~~~~~~~~~
Aborted
Because Nelua insists on minimalism and portability, this isn't readily improved by a little binary bloat (putting the checks at the callsites). But, because Nelua insists on minimalism and portability, it's very easily debugged by C tools such as gdb. Re-running the program with either -S/--sanitize or -d/--debug will provide a stacktrace.
Note that the binary-bloat solution that most languages take is not that useful as soon as you have another level of callers. Suppose you have an application with 100 calls of a library function that itself calls vector.__atindex. When a bounds check fails in the vector.nelua code, most languages would point the error at its callsite in the library function ... which still doesn't tell you, the application author, which of your 100 calls is the source of the error. Stacktraces are the most useful, here, and they're a planned feature for errorhandling.nelua
coroutine.spawn() won't exhibit the implicit conversions that a function call would
Consider:
require 'coroutine'
require 'span'
local function f(mem: span(integer))
print(mem.data)
end
local a: []integer = {1, 2, 3, 4, 5}
f(a)
coroutine.spawn(f, a)
coroutine.spawn(f, (@span(integer))(a))This could have output like
0x55edd6de70a0 0x4 0x55edd6de70a0
To get the intended behavior, use the explicit cast.
&self is probably a mistake
In the following program, the first address printed is of the implicit parameter to the function, which could've been equivalently written as function r.f(self: *r), and &self is a pointer to a pointer to an r, and a pointer that will be invalidated as soon as the function returns. If you meant to pass a pointer to r somewhere, self already is that.
local r = @record{}
function r:f()
print((@pointer)(&self)) -- 0x7ffeb657a4f8
print((@pointer)(self)) -- 0x404049
## print(self.type) -- pointer(r)
local p = &self
## p:resolve_type()
## print(p.type) -- pointer(pointer(r))
end
local o: r
o:f()
print((@pointer)(&o)) -- 0x404049What about when r.f takes the object directly? In the next program, it's successfully modified in the function, but the caller doesn't see the change because the function was given a copy to modify. This might be what you want.
local r = @record{n: integer}
local function modify(o: *r) o.n = o.n + 1 end
function r.f(self: r)
modify(&self)
print(self.n) -- 1
end
local o: r
o:f()
print(o.n) -- 0__tostring must return a default-allocated string
To documentation says as much:
The __tostring metamethod may be called, in this case, it must always return a new allocated string.
But it's an easy rule to violate by returning static strings and thinking the GC will take care of it:
require 'io'
local T = @record{}
function T:__tostring() return 'static string' end
io.printf('%s\n', (@T){})This blows up as io.printf calls string:destroy() on the string it gets back from its implicit call to T:__tostring().
gc.nelua:140:20: runtime error: invalid unregister pointer
assert(oldsize ~= 0, 'invalid unregister pointer')
^~~~
Aborted (SIGABRT)
An easy fix is to call string.copy() in this case, or to instead implement __tostringview:
require 'io'
local T = @record{}
function T:__tostringview() return 'static string' end
io.printf('%s\n', (@T){})NB. if you implement both, __tostring will be preferred.