discriminated unions
ML variants, or Rust enumerations, or D sumtypes, or Nim object variants, or Ada variant records, or -- however you know them, they're a way of modelling data, popularized by FP languages, which
- should be relatively efficient, as all the cases are known at compile-time (vs. OOP where runtime additions are often possible)
- can encourage relatively coherent code, as it's easy to see if all cases are enumerated, so if a new case is added later (e.g., a payment system now takes credit cards, when previously it only took cash or check payments) it is easy to ensure that every part of the code that cares about different cases is updated to care about the new case (vs. OOP where this kind of friction is generally the opposite of the point)
- can result in relatively clear code, as different cases are explicitly considered (vs. OOP where method dispatch is deliberately implicit)
When this functionality is implemented in non-FP languages, such as C++, the examples are generally extremely poor, focusing on single variables that can be one of a couple of primitive types. I'll try to do better here.
Of the different names,
- 'discriminated union'/'tagged union' emphasize an implementation strategy,
- 'variant' emphasizes logical disjunction, and
- 'sumtype' emphasizes logical conflation - the best emphasis, IMO.
The best line from that last link:
A tagged union can be seen as the simplest kind of self-describing data format. The tag of the tagged union can be seen as the simplest kind of metadata.
Anyway,
- d1.nelua - a bare-bones union of structs
- d2.nelua - d1 with tagged_union.nelua
- d3.nelua - slightly nicer d1
- d4.nelua - flattened records with shared fields
- d5.nelua - flattened goblins
d1.nelua - bare-bones union of structs
Consider:
require 'traits'
require 'iterators'
local card = @record{name: string, code: string, cvv: integer, amount: number}
local crypto = @record{coin: string, wallet: string, amount: number}
local cash = @record{currency: string, amount: number}
local payment = @record{
kind: traits.typeid,
payment: union{card: card, crypto: crypto, cash: cash},
}
local payments: []payment = {
{#[cash.value.id]#, {cash={'USD', 100}}},
{#[card.value.id]#, {card={'Some Name', '1234', 123, 100}}},
{#[cash.value.id]#, {cash={'USD', 100}}},
{#[cash.value.id]#, {cash={'USD', 100}}},
{#[crypto.value.id]#, {crypto={'BTC', 'mine', 0.001}}},
}
for _, p in ipairs(payments) do
switch p.kind do
case #[card.value.id]# then print('a card payment with CVV:', p.payment.card.cvv)
case #[cash.value.id]# then print('a cash payment in amount:', p.payment.cash.amount)
else print('unsupported payment method')
end
endoutput:
a cash payment in amount: 100.0 a card payment with CVV: 123 a cash payment in amount: 100.0 a cash payment in amount: 100.0 unsupported payment method
Here we have three different types of payments: card, crypto, and cash, and a 'payment' type that conflates them. 'payment.payment' (the union) has one of these payments, occupying the same potential memory, and 'payment.kind' (the discriminant) should have a number that tells you which payment you actually have.
Later, 'payments' is set to an array of five payments, and at runtime they're looped over and some information about each is printed out.
This is about as simple as it gets, with a single clever point: the discriminant is a 32-bit integer provided by Nelua, unique to each defined type. This is also the worst point of the example: this number will serialize differently from build to build, unlike an enum which can easily have a consistent representation.
The generated C is also about as simple as it gets:
union union_UtR8d7anMzyY1ivp {
d1_card card;
d1_crypto crypto;
d1_cash cash;
};
struct d1_payment {
uint32_t kind;
union_UtR8d7anMzyY1ivp payment;
};
...
/* ipairs loop: */
switch(p.kind) {
case 73: {
nelua_print_1(((nlstring){(uint8_t*)"a card payment with CVV:", 24}), p.payment.card.cvv);
break;
}
case 75: {
nelua_print_2(((nlstring){(uint8_t*)"a cash payment in amount:", 25}), p.payment.cash.amount);
break;
}
default: {
nelua_print_3(((nlstring){(uint8_t*)"unsupported payment method", 26}));
break;
}
}So this example at least achieves the desired level of efficiency.
d2.nelua - d1 with tagged_union.nelua
This is the same code as above, and the output is the same, but using Andre-La/1bit's tagged_union.nelua.
require 'iterators'
local sumtype = require 'tagged_union'
local payment = @sumtype(union{
card: record{name: string, code: string, cvv: integer, amount: number},
crypto: record{coin: string, wallet: string, amount: number},
cash: record{currency: string, amount: number},
})
local payments: []payment = {
payment.cash{'USD', 100},
payment.card{'Some Name', '1234', 123, 100},
payment.cash{'USD', 100},
payment.cash{'USD', 100},
payment.crypto{'BTC', 'mine', 0.001},
}
for _, p in ipairs(payments) do
if p:is(@payment.card) then print('a card payment with CVV:', p:get(@payment.card).cvv)
elseif p:is(@payment.cash) then print('a cash payment in amount:', p:get(@payment.cash).amount)
else print('unsupported payment method')
end
endThat's nicer, isn't it? Upsides:
- the definition and construction of sumtypes is clearly better, with less repetition and room for error
- :is/:get are safer and leave you always dealing with well-typed data - you can't mess up the type punning with code like
if is-cash then treat-as-crypto end
Downsides:
- it has worse performance from not being able to switch off the discriminant. It's no longer as efficient as hand-written C.
An alternative library with the same upsides and same downside is linky/pancake-lib's sumtype.nelua. There's a little bit more Rust influence there, but both libraries are IMHO missing an essential OCaml influence: the thing most to be emulated is the primacy, the efficiency, and the exhaustiveness of the pattern-matching. :is/:get are more similar to object casting in an OOP language:
class Payment {}
class Card : Payment { string name; string code; int cvv; double amount; this(string n, string c, int v, double a) { name=n; code=c; cvv=v; amount=a; } }
class Crypto : Payment { string coin; string wallet; double amount; this(string c, string w, double a) { coin=c; wallet=w; amount=a; } }
class Cash : Payment { string currency; double amount; this(string c, double a) { currency=c; amount=a; } }
void main() {
import std.stdio : writeln;
Payment[] payments = [
new Cash("USD", 100),
new Card("Some Name", "1234", 123, 100),
new Cash("USD", 100),
new Cash("USD", 100),
new Crypto("BTC", "mine", 0.001),
];
foreach (p; payments) {
if (auto card = cast(Card)p) writeln("a card payment with CVV: ", card.cvv);
else if (auto cash = cast(Cash)p) writeln("a cash payment in amount: ", cash.amount);
else writeln("unsupported payment method");
}
}d3.nelua - slightly nicer d1
A typeid discriminant really was a bad idea, and it's not uncommon that the subtypes aren't needed by themselves. This is a little bit neater:
require 'iterators'
local payment = @record{
kind: enum{card = 1, crypto, cash},
payment: union{
card: record{name: string, code: string, cvv: integer, amount: number},
crypto: record{coin: string, wallet: string, amount: number},
cash: record{currency: string, amount: number},
},
}
global payment.kind: type = #[payment.value.fields.kind.type]#
local payments: []payment = {
{payment.kind.cash, {cash={'USD', 100}}},
{payment.kind.card, {card={'Some Name', '1234', 123, 100}}},
{payment.kind.cash, {cash={'USD', 100}}},
{payment.kind.cash, {cash={'USD', 100}}},
{payment.kind.crypto, {crypto={'BTC', 'mine', 0.001}}},
}
for _, p in ipairs(payments) do
switch p.kind do
case payment.kind.card then print('a card payment with CVV:', p.payment.card.cvv)
case payment.kind.cash then print('a cash payment in amount:', p.payment.cash.amount)
else print('unsupported payment method')
end
endPersonally, I'd stop here. Construction is verbose but unimportant in practice: as constructors are not reused for pattern-matching here, construction is much rarer than in OCaml. Switching off the discriminant could be nicer but just doesn't need to be. Definition of the sumtype is ideal, only needing a .kind alias for convenience.
Let's change just that loop at the end:
for _, p in ipairs(payments) do
local _ <using> = payment.kind
switch p.kind do
case card then
local p = p.payment.card
print('a card payment in amount:', p.amount, 'and cvv:', p.cvv)
case cash then
local p = p.payment.cash
print('a cash payment in amount:', p.amount, 'of currency:', p.currency)
case crypto then
local p = p.payment.crypto
print('a crypto payment in amount:', p.amount, 'of coin:', p.coin)
else panic('unsupported payment method') end
endA syntax that expanded a new 'match' control flow into this, that might be nice:
for _, p in ipairs(payments) do
local _ <using> = payment.kind
match p do
case card then print('a card payment in amount:', p.amount, 'and cvv:', p.cvv)
case cash then print('a cash payment in amount:', p.amount, 'of currency:', p.currency)
case crypto then print('a crypto payment in amount:', p.amount, 'of coin:', p.coin)
else panic('unsupported payment method') end
endCan that be done without a new syntax? ... I don't think so. The closest I think you can get is the following, which is quite ugly. It requires a consistent name for the payload.
## local function match(v, e, f)
if #[v]#.kind == #[e]# then
local #|v.name|# = #[v]#.payload.#|e.name|#
## f()
end
## end
for _, p in ipairs(payments) do
local _ <using> = payment.kind
## match(p, card, function()
print('a card payment in amount:', p.amount, 'and cvv:', p.cvv)
continue
## end)
## match(p, cash, function()
print('a cash payment in amount:', p.amount, 'of currency:', p.currency)
continue
## end)
## match(p, crypto, function()
print('a crypto payment in amount:', p.amount, 'of coin:', p.coin)
continue
## end)
panic('unsupported payment method')
endd4.nelua - flattened records with shared fields
There are definite style improvements needed here, but you get the idea - this is more similar to Nim with explicit discriminants and each variant needing to have unique fields:
require 'iterators'
local payment = @record{
kind: enum{card = 1, crypto, cash},
amount: number,
payment: union{
card: record{name: string, code: string, cvv: integer},
crypto: record{coin: string, wallet: string},
cash: record{currency: string},
},
}
global payment.kind: type = #[payment.value.fields.kind.type]#
##[[
local function alias_field(recsym, name, dotindex, fieldtypesym)
recsym.value.fields[name] = {name = dotindex, type = fieldtypesym.value}
end
]]
alias_field!(@payment, 'name', 'payment.card.name', @string)
alias_field!(@payment, 'code', 'payment.card.code', @string)
alias_field!(@payment, 'cvv', 'payment.card.cvv', @integer)
alias_field!(@payment, 'coin', 'payment.crypto.coin', @string)
alias_field!(@payment, 'wallet', 'payment.crypto.wallet', @string)
alias_field!(@payment, 'currency', 'payment.cash.currency', @string)
local payments: []payment = {
{payment.kind.cash, 100, {cash={'USD'}}},
{payment.kind.card, 100, {card={'Some Name', '1234', 123}}},
{payment.kind.cash, 100, {cash={'USD'}}},
{payment.kind.cash, 100, {cash={'USD'}}},
{payment.kind.crypto, 0.001, {crypto={'BTC', 'mine'}}},
}
for _, p in ipairs(payments) do
local _ <using> = payment.kind
switch p.kind do
case card then print('a card payment in amount:', p.amount, 'and cvv:', p.cvv)
case cash then print('a cash payment in amount:', p.amount, 'of currency:', p.currency)
case crypto then print('a crypto payment in amount:', p.amount, 'of coin:', p.coin)
else panic('unsupported payment method') end
endd5.nelua - flattened goblins
A more involved example from an old Nim bug report:
local Alignments = @enum{Lawful = 1, Neutral, Chaotic}
local Goblin = @record{
name: string,
align: Alignments,
payload: union{
nonchaotic: record{
actualSpecies: string,
transformedBy: string,
spellDuration: integer,
},
chaotic: record{
murders: integer,
injuries: integer,
despoils: integer,
evilness: integer,
},
},
}
function Goblin:dump()
print '---'
print('name', self.name)
print('align', self.align)
if self.align ~= Alignments.Chaotic then
print('actualSpecies', self.payload.nonchaotic.actualSpecies)
print('transformedBy', self.payload.nonchaotic.transformedBy)
print('spellDuration', self.payload.nonchaotic.spellDuration)
else
print('murders', self.payload.chaotic.murders)
print('injuries', self.payload.chaotic.injuries)
print('despoils', self.payload.chaotic.despoils)
print('evilness', self.payload.chaotic.evilness)
end
end
##[[
local function flatten(ns, field)
local function alias_field(recsym, name, dotindex, fieldtypesym)
recsym.value.fields[name] = {name = dotindex, type = fieldtypesym}
end
for _, name in ipairs(ns.value.fields.payload.type.fields[field].type.fields) do
alias_field(ns, name.name, 'payload.'..field..'.'..name.name, ns.value.fields.payload.type.fields[field].type.fields[name.name].type)
end
end
]]
flatten!(Goblin, 'nonchaotic')
flatten!(Goblin, 'chaotic')
local function initGoblin(name: string, align: Alignments): Goblin
switch align do
case Alignments.Chaotic then
local g: Goblin = {name=name, align=align}
g.evilness=9000
return g
case Alignments.Neutral, Alignments.Lawful then
local g: Goblin = {name=name, align=align}
g.actualSpecies = 'probably some kind of human'
g.transformedBy = 'a demon maybe?'
g.spellDuration = 500
return g
end
panic('invalid alignment')
return {}
end
initGoblin('Alice', Alignments.Lawful):dump()
initGoblin('Bob', Alignments.Chaotic):dump()Output:
--- name Alice align 1 actualSpecies probably some kind of human transformedBy a demon maybe? spellDuration 500 --- name Bob align 3 murders 0 injuries 0 despoils 0 evilness 9000
NB. this flattening doesn't extend to being able to write (@Goblin){name=name, align=align, actualSpecies='human', ...}.