Lua Notes

discriminated unions
Login

discriminated unions

ML variants, or Rust enumerations, or D sumtypes, or Nim object variants, or Ada variant records, or -- however you know them, they're a way of modelling data, popularized by FP languages, which

When this functionality is implemented in non-FP languages, such as C++, the examples are generally extremely poor, focusing on single variables that can be one of a couple of primitive types. I'll try to do better here.

Of the different names,

The best line from that last link:

A tagged union can be seen as the simplest kind of self-describing data format. The tag of the tagged union can be seen as the simplest kind of metadata.

Anyway,

  1. d1.nelua - a bare-bones union of structs
  2. d2.nelua - d1 with tagged_union.nelua
  3. d3.nelua - slightly nicer d1
  4. d4.nelua - flattened records with shared fields
  5. d5.nelua - flattened goblins

d1.nelua - bare-bones union of structs

Consider:

require 'traits'
require 'iterators'

local card = @record{name: string, code: string, cvv: integer, amount: number}
local crypto = @record{coin: string, wallet: string, amount: number}
local cash = @record{currency: string, amount: number}
local payment = @record{
  kind: traits.typeid,
  payment: union{card: card, crypto: crypto, cash: cash},
}

local payments: []payment = {
  {#[cash.value.id]#, {cash={'USD', 100}}},
  {#[card.value.id]#, {card={'Some Name', '1234', 123, 100}}},
  {#[cash.value.id]#, {cash={'USD', 100}}},
  {#[cash.value.id]#, {cash={'USD', 100}}},
  {#[crypto.value.id]#, {crypto={'BTC', 'mine', 0.001}}},
}

for _, p in ipairs(payments) do
  switch p.kind do
  case #[card.value.id]# then print('a card payment with CVV:', p.payment.card.cvv)
  case #[cash.value.id]# then print('a cash payment in amount:', p.payment.cash.amount)
  else print('unsupported payment method')
  end
end

output:

a cash payment in amount:	100.0
a card payment with CVV:	123
a cash payment in amount:	100.0
a cash payment in amount:	100.0
unsupported payment method

Here we have three different types of payments: card, crypto, and cash, and a 'payment' type that conflates them. 'payment.payment' (the union) has one of these payments, occupying the same potential memory, and 'payment.kind' (the discriminant) should have a number that tells you which payment you actually have.

Later, 'payments' is set to an array of five payments, and at runtime they're looped over and some information about each is printed out.

This is about as simple as it gets, with a single clever point: the discriminant is a 32-bit integer provided by Nelua, unique to each defined type. This is also the worst point of the example: this number will serialize differently from build to build, unlike an enum which can easily have a consistent representation.

The generated C is also about as simple as it gets:

union union_UtR8d7anMzyY1ivp {
  d1_card card;
  d1_crypto crypto;
  d1_cash cash;
};
struct d1_payment {
  uint32_t kind;
  union_UtR8d7anMzyY1ivp payment;
};
...
/* ipairs loop: */
  switch(p.kind) {
    case 73: {
      nelua_print_1(((nlstring){(uint8_t*)"a card payment with CVV:", 24}), p.payment.card.cvv);
      break;
    }
    case 75: {
      nelua_print_2(((nlstring){(uint8_t*)"a cash payment in amount:", 25}), p.payment.cash.amount);
      break;
    }
    default: {
      nelua_print_3(((nlstring){(uint8_t*)"unsupported payment method", 26}));
      break;
    }
  }

So this example at least achieves the desired level of efficiency.

d2.nelua - d1 with tagged_union.nelua

This is the same code as above, and the output is the same, but using Andre-La/1bit's tagged_union.nelua.

require 'iterators'
local sumtype = require 'tagged_union'

local payment = @sumtype(union{
  card: record{name: string, code: string, cvv: integer, amount: number},
  crypto: record{coin: string, wallet: string, amount: number},
  cash: record{currency: string, amount: number},
})

local payments: []payment = {
  payment.cash{'USD', 100},
  payment.card{'Some Name', '1234', 123, 100},
  payment.cash{'USD', 100},
  payment.cash{'USD', 100},
  payment.crypto{'BTC', 'mine', 0.001},
}

for _, p in ipairs(payments) do
  if     p:is(@payment.card) then print('a card payment with CVV:', p:get(@payment.card).cvv)
  elseif p:is(@payment.cash) then print('a cash payment in amount:', p:get(@payment.cash).amount)
  else print('unsupported payment method')
  end
end

That's nicer, isn't it? Upsides:

  1. the definition and construction of sumtypes is clearly better, with less repetition and room for error
  2. :is/:get are safer and leave you always dealing with well-typed data - you can't mess up the type punning with code like if is-cash then treat-as-crypto end

Downsides:

  1. it has worse performance from not being able to switch off the discriminant. It's no longer as efficient as hand-written C.

An alternative library with the same upsides and same downside is linky/pancake-lib's sumtype.nelua. There's a little bit more Rust influence there, but both libraries are IMHO missing an essential OCaml influence: the thing most to be emulated is the primacy, the efficiency, and the exhaustiveness of the pattern-matching. :is/:get are more similar to object casting in an OOP language:

class Payment {}
class Card : Payment { string name; string code; int cvv; double amount; this(string n, string c, int v, double a) { name=n; code=c; cvv=v; amount=a; } }
class Crypto : Payment { string coin; string wallet; double amount; this(string c, string w, double a) { coin=c; wallet=w; amount=a; } }
class Cash : Payment { string currency; double amount; this(string c, double a) { currency=c; amount=a; } }

void main() {
	import std.stdio : writeln;

	Payment[] payments = [
		new Cash("USD", 100),
		new Card("Some Name", "1234", 123, 100),
		new Cash("USD", 100),
		new Cash("USD", 100),
		new Crypto("BTC", "mine", 0.001),
	];
	foreach (p; payments) {
		if (auto card = cast(Card)p) writeln("a card payment with CVV: ", card.cvv);
		else if (auto cash = cast(Cash)p) writeln("a cash payment in amount: ", cash.amount);
		else writeln("unsupported payment method");
	}
}

d3.nelua - slightly nicer d1

A typeid discriminant really was a bad idea, and it's not uncommon that the subtypes aren't needed by themselves. This is a little bit neater:

require 'iterators'

local payment = @record{
  kind: enum{card = 1, crypto, cash},
  payment: union{
    card: record{name: string, code: string, cvv: integer, amount: number},
    crypto: record{coin: string, wallet: string, amount: number},
    cash: record{currency: string, amount: number},
  },
}
global payment.kind: type = #[payment.value.fields.kind.type]#

local payments: []payment = {
  {payment.kind.cash, {cash={'USD', 100}}},
  {payment.kind.card, {card={'Some Name', '1234', 123, 100}}},
  {payment.kind.cash, {cash={'USD', 100}}},
  {payment.kind.cash, {cash={'USD', 100}}},
  {payment.kind.crypto, {crypto={'BTC', 'mine', 0.001}}},
}

for _, p in ipairs(payments) do
  switch p.kind do
  case payment.kind.card then print('a card payment with CVV:', p.payment.card.cvv)
  case payment.kind.cash then print('a cash payment in amount:', p.payment.cash.amount)
  else print('unsupported payment method')
  end
end

Personally, I'd stop here. Construction is verbose but unimportant in practice: as constructors are not reused for pattern-matching here, construction is much rarer than in OCaml. Switching off the discriminant could be nicer but just doesn't need to be. Definition of the sumtype is ideal, only needing a .kind alias for convenience.

Let's change just that loop at the end:

for _, p in ipairs(payments) do
  local _ <using> = payment.kind
  switch p.kind do
  case card then 
    local p = p.payment.card
    print('a card payment in amount:', p.amount, 'and cvv:', p.cvv)
  case cash then 
    local p = p.payment.cash
    print('a cash payment in amount:', p.amount, 'of currency:', p.currency)
  case crypto then 
    local p = p.payment.crypto
    print('a crypto payment in amount:', p.amount, 'of coin:', p.coin)
  else panic('unsupported payment method') end
end

A syntax that expanded a new 'match' control flow into this, that might be nice:

for _, p in ipairs(payments) do
  local _ <using> = payment.kind
  match p do           
  case card then print('a card payment in amount:', p.amount, 'and cvv:', p.cvv) 
  case cash then print('a cash payment in amount:', p.amount, 'of currency:', p.currency) 
  case crypto then print('a crypto payment in amount:', p.amount, 'of coin:', p.coin) 
  else panic('unsupported payment method') end
end

Can that be done without a new syntax? ... I don't think so. The closest I think you can get is the following, which is quite ugly. It requires a consistent name for the payload.

## local function match(v, e, f)
  if #[v]#.kind == #[e]# then
    local #|v.name|# = #[v]#.payload.#|e.name|#
    ## f()
  end
## end

for _, p in ipairs(payments) do
  local _ <using> = payment.kind
  ## match(p, card, function()
    print('a card payment in amount:', p.amount, 'and cvv:', p.cvv)
    continue
  ## end)
  ## match(p, cash, function()
    print('a cash payment in amount:', p.amount, 'of currency:', p.currency)
    continue
  ## end)
  ## match(p, crypto, function()
    print('a crypto payment in amount:', p.amount, 'of coin:', p.coin)
    continue
  ## end)
  panic('unsupported payment method')
end

d4.nelua - flattened records with shared fields

There are definite style improvements needed here, but you get the idea - this is more similar to Nim with explicit discriminants and each variant needing to have unique fields:

require 'iterators'

local payment = @record{
  kind: enum{card = 1, crypto, cash},
  amount: number,
  payment: union{
    card: record{name: string, code: string, cvv: integer},
    crypto: record{coin: string, wallet: string},
    cash: record{currency: string},
  },
}
global payment.kind: type = #[payment.value.fields.kind.type]#
##[[
local function alias_field(recsym, name, dotindex, fieldtypesym)
  recsym.value.fields[name] = {name = dotindex, type = fieldtypesym.value}
end
]]
alias_field!(@payment, 'name', 'payment.card.name', @string)
alias_field!(@payment, 'code', 'payment.card.code', @string)
alias_field!(@payment, 'cvv', 'payment.card.cvv', @integer)
alias_field!(@payment, 'coin', 'payment.crypto.coin', @string)
alias_field!(@payment, 'wallet', 'payment.crypto.wallet', @string)
alias_field!(@payment, 'currency', 'payment.cash.currency', @string)

local payments: []payment = {
  {payment.kind.cash, 100, {cash={'USD'}}},
  {payment.kind.card, 100, {card={'Some Name', '1234', 123}}},
  {payment.kind.cash, 100, {cash={'USD'}}},
  {payment.kind.cash, 100, {cash={'USD'}}},
  {payment.kind.crypto, 0.001, {crypto={'BTC', 'mine'}}},
}

for _, p in ipairs(payments) do
  local _ <using> = payment.kind
  switch p.kind do
  case card then print('a card payment in amount:', p.amount, 'and cvv:', p.cvv)
  case cash then print('a cash payment in amount:', p.amount, 'of currency:', p.currency)
  case crypto then print('a crypto payment in amount:', p.amount, 'of coin:', p.coin)
  else panic('unsupported payment method') end
end

d5.nelua - flattened goblins

A more involved example from an old Nim bug report:

local Alignments = @enum{Lawful = 1, Neutral, Chaotic}
local Goblin = @record{
  name: string,
  align: Alignments,
  payload: union{
    nonchaotic: record{
      actualSpecies: string,
      transformedBy: string,
      spellDuration: integer,
    },
    chaotic: record{
      murders: integer,
      injuries: integer,
      despoils: integer,
      evilness: integer,
    },
  },
}
function Goblin:dump()
  print '---'
  print('name', self.name)
  print('align', self.align)
  if self.align ~= Alignments.Chaotic then
    print('actualSpecies', self.payload.nonchaotic.actualSpecies)
    print('transformedBy', self.payload.nonchaotic.transformedBy)
    print('spellDuration', self.payload.nonchaotic.spellDuration)
  else
    print('murders', self.payload.chaotic.murders)
    print('injuries', self.payload.chaotic.injuries)
    print('despoils', self.payload.chaotic.despoils)
    print('evilness', self.payload.chaotic.evilness)
  end
end
##[[
local function flatten(ns, field)
  local function alias_field(recsym, name, dotindex, fieldtypesym)
    recsym.value.fields[name] = {name = dotindex, type = fieldtypesym}
  end
  for _, name in ipairs(ns.value.fields.payload.type.fields[field].type.fields) do
    alias_field(ns, name.name, 'payload.'..field..'.'..name.name, ns.value.fields.payload.type.fields[field].type.fields[name.name].type)
  end
end
]]
flatten!(Goblin, 'nonchaotic')
flatten!(Goblin, 'chaotic')

local function initGoblin(name: string, align: Alignments): Goblin
  switch align do
  case Alignments.Chaotic then
    local g: Goblin = {name=name, align=align}
    g.evilness=9000
    return g
  case Alignments.Neutral, Alignments.Lawful then
    local g: Goblin = {name=name, align=align}
    g.actualSpecies = 'probably some kind of human'
    g.transformedBy = 'a demon maybe?'
    g.spellDuration = 500
    return g
  end
  panic('invalid alignment')
  return {}
end
initGoblin('Alice', Alignments.Lawful):dump()
initGoblin('Bob', Alignments.Chaotic):dump()

Output:

---
name	Alice
align	1
actualSpecies	probably some kind of human
transformedBy	a demon maybe?
spellDuration	500
---
name	Bob
align	3
murders	0
injuries	0
despoils	0
evilness	9000

NB. this flattening doesn't extend to being able to write (@Goblin){name=name, align=align, actualSpecies='human', ...}.