Built-in Functions
DaphneDSL offers numerous built-in functions, which can be used in every DaphneDSL script without requiring any imports.
The general syntax for calling a built-in function is func(param1, param2, ...) (see the DaphneDSL Language Reference).
This document provides an overview of the DaphneDSL built-in functions. Note that we are still extending this set of built-in functions. Furthermore, we also plan to create a library of higher-level ML primitives allowing users to productively implement integrated data analysis pipelines at a much higher level of abstraction. Those library functions will internally be implemented using the built-in functions described in this document.
We use the following notation (deviating from the DaphneDSL function syntax):
- square brackets 
[]mean that a parameter is optional ...stands for an arbitrary repetition of the previous parameter (including zero)./means alternative options, e.g.,matrix/framemeans the parameter could be a matrix or a frame
List of Categories
DaphneDSL's built-in functions can be categorized as follows:
- Data generation
 - Matrix/frame meta data
 - Elementwise unary
 - Elementwise binary
 - Outer binary (generalized outer product)
 - Aggregation and statistical
 - Reorganization
 - Matrix decomposition & co
 - Deep neural network
 - Other matrix operations
 - Extended relational algebra
 - Conversions and casts
 - Input/output
 - Data preprocessing
 - Measurements
 - List operations
 
Data Generation
- 
fill(value:scalar, numRows:size, numCols:size)Creates a (
numRowsxnumCols) matrix and sets all elements tovalue. - 
createFrame(column:matrix, ...[, labels:str, ...])Creates a frame from an arbitrary number of column matrices. Optionally, a label can be specified for each column (the number of provided columns and labels must be equal).
 - 
diagMatrix(arg:matrix)Creates an (n x n) diagonal matrix by placing the elements of the given (n x 1) column-matrix
argon the diagonal of an otherwise empty (zero) square matrix. - 
rand(numRows:size, numCols:size, min:scalar, max:scalar, sparsity:double, seed:si64)Generates a (
numRowsxnumCols) matrix of random values. The values are drawn uniformly from the range [min,max] (both inclusive). A value of zero in theminparameter will be ignored, as the insertion of zeros in the output is controlled by thesparsityparameter. Thesparsitycan be chosen between0.0(all zeros) and1.0(all non-zeros). Theseedcan be set to-1(randomly chooses a seed), or be provided explicitly to enable reproducible random values. - 
sample(range:scalar, size:size, withReplacement:bool, seed:si64)Generates a (
sizex 1) column-matrix of values drawn from the range [0,range - 1]. The parameterwithReplacementdetermines if a value can be drawn multiple times (true) or not (false). Theseedcan be set to-1(randomly chooses a seed), or be provided explicitly to enable reproducible random values. - 
seq(from:scalar, to:scalar[, inc:scalar])Generates a column matrix containing an arithmetic sequence of values starting at
from, going throughto, in increments ofinc. Note thatfrommay be greater thanto, andincmay be negative. The scalarincis an optional argument and defaults to1. 
Matrix/Frame Meta Data
The following built-in functions allow to find out meta data of matrices and frames.
- 
typeOf(arg:scalar/matrix/frame)Returns a string with the data type of
arg, which can be a scalar, matrix, or a frame containing mixed value types. For matrices and frames, this includes their dimensions and value type or value type per column in the case of frames. Value types are named after their C++ representation. - 
nrow(arg:matrix/frame)Returns the number of rows in
arg. - 
ncol(arg:matrix/frame)Returns the number of columns in
arg. - 
ncell(arg:matrix/frame)Returns the number of cells in
arg. This is the product of the number of rows and the number of columns. - 
sparsity(arg:matrix)Returns the DAPHNE compiler's estimate of the argument's sparsity. Note that this value may deviate from the actual sparsity of the data at run-time.
 - 
isSymmetric(arg:matrix)Returns
trueif and only if the given (n x n) matrix is symmetric, i.e., for any i in {0, 1, ..., n-1} and j in {0, 1, ..., n-1}, the value at position (i, j) is the same as the value at position (j, i). The given matrix must be a square matrix. 
Elementwise Unary
The following built-in functions all follow the same scheme:
- 
unaryFunc(arg:scalar/matrix)Applies the respective unary function (see table below) to the given scalar
argor to each element of the given matrixarg. 
Arithmetic/General Math
| function | meaning | 
|---|---|
abs | 
absolute value | 
sign | 
signum (1 for positive, 0 for zero, -1 for negative) | 
exp | 
exponentiation (e to the power of arg) | 
ln | 
natural logarithm (logarithm of arg to the base of e) | 
sqrt | 
square root | 
Trigonometric/Hyperbolic
arg unit must be radians (conversion: \(x^\circ * \frac{\pi}{180^\circ} = y\) radians)
| function | meaning | 
|---|---|
sin | 
sine | 
cos | 
cosine | 
tan | 
tangent | 
asin | 
arc sine (inverse of sine) | 
acos | 
arc cosine (inverse of cosine) | 
atan | 
arc tangent (inverse of tangent) | 
sinh | 
hyperbolic sine \(\left( \frac{\exp(\text{arg}) \, - \, \exp(\text{ - arg})}{2} \right)\) | 
cosh | 
hyperbolic cosine \(\left( \frac{\exp(\text{arg}) \, + \, \exp(\text{ - arg})}{2} \right)\) | 
tanh | 
hyperbolic tangent \(\left( \frac{\text{sinh arg}}{\text{cosh arg}} \right)\) | 
Rounding
| function | meaning | 
|---|---|
round | 
round to nearest | 
floor | 
round down | 
ceil | 
round up | 
Comparison
| function | meaning | 
|---|---|
isNan | 
1 if argument is NaN, 0 otherwise | 
Elementwise Binary
DaphneDSL supports various elementwise binary operations.
Some of those can be used through operators in infix notation, e.g., +; and some through built-in functions, e.g., log().
Some operations even support both, e.g., pow(a, b) and a^b have the same semantics.
The built-in functions all follow the same scheme:
- 
binaryFunc(lhs:scalar/matrix, rhs:scalar/matrix)Applies the respective binary function (see table below) to the corresponding pairs of a value in the left-hand-side argument
lhsand the right-hand-side argumentrhs. Regarding the combinations of scalars and matrices, the same broadcasting semantics apply as for binary operations like+,*, etc. (see the DaphneDSL Language Reference). 
Arithmetic
| function | operator | meaning | 
|---|---|---|
+ | 
addition | |
- | 
subtraction | |
* | 
multiplication | |
/ | 
division | |
pow | 
^ | 
exponentiation (lhs to the power of rhs) | 
log | 
logarithm (logarithm of lhs to the base of rhs) | 
|
mod | 
% | 
modulo | 
Min/Max
| function | operator | meaning | 
|---|---|---|
min | 
minimum | |
max | 
maximum | 
Logical
| function | operator | meaning | 
|---|---|---|
&& | 
logical conjunction | |
|| | 
logical disjunction | 
Strings
| function | operator | meaning | 
|---|---|---|
concat | 
+ | 
string concatenation | 
Comparison
| function | operator | meaning | 
|---|---|---|
== | 
equal | |
!= | 
not equal | |
< | 
less than | |
<= | 
less or equal | |
> | 
greater than | |
>= | 
greater or equal | 
Outer Binary (Generalized Outer Product)
The following built-in functions all follow the same scheme:
- 
outerBinaryFunc(lhs:matrix, rhs:matrix)The argument
lhsis expected to be a column (m x 1) matrix, and the argumentrhsis expected to be a row (1 x n) matrix. The result is a (m x n) matrix, whereby the element at position (i, j) is calculated by applying the respective binary function (see the table below) to the i-th element inlhsand the j-th element inrhs. Schematically, this looks as follows (where∘is some binary operation): 
Arithmetic
| function | meaning | 
|---|---|
outerAdd | 
addition | 
outerSub | 
subtraction | 
outerMul | 
multiplication (the well-known outer product) | 
outerDiv | 
division | 
outerPow | 
exponentiation (lhs to the power of rhs) | 
outerLog | 
logarithm (logarithm of lhs to the base of rhs) | 
outerMod | 
modulo | 
Min/Max
| function | meaning | 
|---|---|
outerMin | 
minimum | 
outerMax | 
maximum | 
Logical
| function | meaning | 
|---|---|
outerAnd | 
logical conjunction | 
outerOr | 
logical disjunction | 
outerXor | 
logical exclusive disjunction (not supported yet) | 
Strings
| function | meaning | 
|---|---|
outerConcat | 
string concatenation (not supported yet) | 
Comparison
| function | meaning | 
|---|---|
outerEq | 
equal | 
outerNeq | 
not equal | 
outerLt | 
less than | 
outerLe | 
less or equal | 
outerGt | 
greater than | 
outerGe | 
greater or equal | 
Aggregation and Statistical
Full/Row/Column Aggregation
The following built-in functions all follow the same scheme:
- 
agg(arg:matrix)Full aggregation over all elements of the matrix
argusing aggregation functionagg(see table below). Returns a scalar. - 
agg(arg:matrix, axis:si64)Row or column aggregation over a (n x m) matrix
argusing aggregation functionagg(see table below).axis== 0: calculate one aggregate per row; the result is a (n x 1) (column) matrixaxis== 1: calculate one aggregate per column; the result is a (1 x m) (row) matrix
 
| function | meaning | 
|---|---|
sum | 
summation | 
aggMin | 
minimum | 
aggMax | 
maximum | 
mean | 
arithmetic mean | 
var | 
variance | 
stddev | 
standard deviation | 
idxMin | 
argmin (the index of the minimum value, only for row/column-wise aggregation) | 
idxMax | 
argmax (the index of the maximum value, only for row/column-wise aggregation) | 
Cumulative Aggregation
The following built-in functions all follow the same scheme:
- 
cumAgg(arg:matrix)Cumulative aggregation over each column of the matrix
arg. Returns a matrix of the same shape asarg. 
| function | meaning | 
|---|---|
cumSum | 
cumulative sum | 
cumProd | 
cumulative product | 
cumMin | 
cumulative minimum | 
cumMax | 
cumulative maximum | 
Reorganization
- 
reshape(arg:matrix, numRows:size, numCols:size)Changes the shape of
argto (numRowsxnumCols). Note that the number of cells must be retained, i.e., the product ofnumRowsandnumColsmust be equal to the product of the number of rows inargand the number of columns inarg. - 
transpose/t(arg:matrix)Transposes the given matrix
arg. - 
cbind(lhs:matrix/frame, rhs:matrix/frame)Concatenates two matrices or two frames horizontally. The two inputs must have the same number of rows.
 - 
rbind(lhs:matrix/frame, rhs:matrix/frame)Concatenates two matrices or two frames vertically. The two inputs must have the same number of columns.
 - 
reverse(arg:matrix)Reverses the rows in the given matrix
arg. - 
order(arg:matrix/frame, colIdxs:size, ..., ascs:bool, ..., returnIndexes:bool)Sorts the given matrix or frame by an arbitrary number of columns. The columns are specified in terms of their indexes (counting starts at zero). Each column can be sorted either in ascending (
true) or descending (false) order (as determined by parameterascs). The provided number of columns and sort orders must match. The parameterreturnIndexesdetermines whether to return the sorted data (false) or a column-matrix of positions representing the permutation applied by the sorting (true). 
Matrix Decomposition & Co
- 
eigen(arg:matrix)Calculates the eigenvalues and eigenvectors of the given matrix. This built-in function has two results: (1) the eigenvalues as a column matrix, and (2) the eigenvectors as a matrix, where each column is an eigenvector.
 
We plan to support additional matrix decompositions like lu, qr, and svd in the future.
Deep Neural Network
Note that most of these operations only have a CUDNN-based kernel for GPU execution at the moment.
- 
affine(inputData:matrix, weightData:matrix, biasData:matrix) - 
avg_pool2d(inputData:matrix, numImages:size, numChannels:size, imgHeight:size, imgWidth:size, poolHeight:size, poolWidth:size, strideHeight:size, strideWidth:size, paddingHeight:size, paddingWidth:size)Performs average pooling operation.
 - 
max_pool2d(inputData:matrix, numImages:size, numChannels:size, imgHeight:size, imgWidth:size, poolHeight:size, poolWidth:size, strideHeight:size, strideWidth:size, paddingHeight:size, paddingWidth:size)Performs max pooling operation.
 - 
batch_norm2d(inputData:matrix, gamma, beta, emaMean, emaVar, eps)Performs batch normalization operation.
 - 
biasAdd(input:matrix, bias:matrix)Adds the (1 x
numChannels) row-matrixbiasto theinputwith the given number of channels. - 
conv2d(input:matrix, filter:matrix, numImages:size, numChannels:size, imgHeight:size, imgWidth:size, filterHeight:size, filterWidth:size, strideHeight:size, strideWidth:size, paddingHeight:size, paddingWidth:size)2D convolution.
 - 
relu(inputData:matrix) - 
softmax(inputData:matrix) 
Other Matrix Operations
- 
diagVector(arg:matrix)Extracts the diagonal of the given (n x n) matrix
argas a (n x 1) column-matrix. - 
lowerTri(arg:matrix, diag:bool, values:bool)Extracts the lower triangle of the given square matrix
argby setting all elements in the upper triangle to zero. Ifdiagistrue, the elements on the diagonal are retained; otherwise, they are set to zero, too. Ifvaluesistrue, the non-zero elements in the lower triangle are retained; otherwise, they are set to one. - 
upperTri(arg:matrix, diag:bool, values:bool)Extracts the upper triangle of the given square matrix
argby setting all elements in the lower triangle to zero. Ifdiagistrue, the elements on the diagonal are retained; otherwise, they are set to zero, too. Ifvaluesistrue, the non-zero elements in the upper triangle are retained; otherwise, they are set to one. - 
solve(A:matrix, b:matrix)Solves the system of linear equations given by the (n x n) matrix
Aand the (n x 1) column-matrixband returns the result as a (n x 1) column-matrix. - 
replace(arg:matrix, pattern:scalar, replacement:scalar)Replaces all occurrences of the element
patternin the matrixargby the elementreplacement. - 
ctable(ys:matrix, xs:matrix[, weight:scalar][, numRows:int, numCols:int])Returns the contingency table of two (n x 1) column-matrices
ysandxs. The resulting matrixresconsists ofmax(ys) + 1rows andmax(xs) + 1columns. More precisely, \(\text{res}[x, y] = \left| \{ k \bigm| \text{ys}[k, 0] = y \wedge \text{xs}[k, 0] = x, \; 0 \leq k \leq n-1 \} \right| * \text{weight} \quad \forall x \in \text{xs}, y \in \text{ys}\).In other words, starting with an all-zero result matrix, all pairs of values \(\{ (\text{xs}[k, 0],\text{ys}[k, 0]) \mid 0 \leq k \leq n-1 \}\) are used to index the result matrix and increase the corresponding value by
weight.
Note thatysandxsmust not contain negative numbers.The scalar weight is an optional argument and defaults to
1.0. The weight also determines the value type of the result.Moreover, optionally, the result shape in terms of the number of rows and columns can be specified. If omitted, it defaults to the smallest numbers required to accommodate all given
y/x-coordinates, as expressed above. If specified, the result can be either cropped or padded with zeros to the desired shape. If a value less than zero is provided as the number of rows/columns, the respective dimension will also be determined from the input data.This built-in function can be called with 2, 3, 4, or 5 arguments, depending on which optional arguments are given.
 - 
syrk(A:martix)Calculates
t(A) @ Aby symmetric rank-k update operations. - 
gemv(A:matrix, x:matrix)Calcuates
t(A) @ xfor the given (n x m) matrixAand (n x 1) column-matrixx. 
Extended Relational Algebra
DaphneDSL supports relational algebra on frames in two ways: On the one hand, entire SQL queries can be executed over previously registered views. This aspect is described in detail in a separate tutorial. On the other hand, built-in functions for individual operations of extended relational algebra can be used on frames in DaphneDSL.
Entire SQL Query
- 
registerView(viewName:str, arg:frame)Registers the frame
argto be accessible to SQL queries by the nameviewName. - 
sql(query:str)Executes the SQL query
queryon the frames previously registered withregisterView()and returns the result as a frame. 
Set Operations
We will support set operations such as intersect, merge, and except.
Cartesian Product and Joins
- 
cartesian(lhs:frame, rhs:frame)Calculates the cartesian product of the two input frames.
 - 
innerJoin(lhs:frame, rhs:frame, lhsOn:str, rhsOn:str[, numRowRes:si64])Performs an inner join of the two input frames on
lhs.lhsOn==rhs.rhsOn.The parameter
numRowResis an optional hint for an upper bound of the number or result rows. If specified, it determines the number of rows that will be allocated for the result, whereby-1stands for an automatically chosen size. Otherwise, it defaults to-1. - 
semiJoin(lhs:frame, rhs:frame, lhsOn:str, rhsOn:str[, numRowRes:si64])Performs a semi join of the two input frames on
lhs.lhsOn==rhs.rhsOn. Returns only the columns belonging tolhs.The parameter
numRowResis an optional hint for an upper bound of the number or result rows. If specified, it determines the number of rows that will be allocated for the result, whereby-1stands for an automatically chosen size. Otherwise, it defaults to-1. - 
groupJoin(lhs:frame, rhs:frame, lhsOn:str, rhsOn:str, rhsAgg:str)Group-join of
lhsandrhsonlhs.lhsOn == rhs.rhsOnwith summation ofrhs.rhsAgg. 
We will support more variants of joins, including (left/right) outer joins, theta joins, anti-joins, etc.
Grouping and Aggregation
- 
groupSum(arg:frame, grpColNames:str[, grpColNames, ...], sumColName:str)Groups the rows in the given frame
argby the specified columnsgrpColNames(at least one column) and calculates the per-group sum of the column denoted bysumColName.This built-in function is currently limited in terms of functionality (aggregation only on a single column, sum as the only aggregation function). It will be extended in the future. Meanwhile, consider using DAPHNE's
sql()built-in function for more comprehensive grouping and aggregation support. 
Frame Label Manipulation
- 
setColLabels(arg:frame, labels:str, ...)Sets the column labels of the given frame
argto the givenlabels. There must be as manylabelsas columns inarg. - 
setColLabelsPrefix(arg:frame, predfix:str, ...)Prepends the given
prefixto the labels of all columns inarg. 
Conversions and Casts
Note that DaphneDSL offers casts in form of the as.()-expression.
See the DaphneDSL Language Reference for details.
- 
quantize(arg:matrix<f32>, min:f32, max:f32)Performs a
min/maxquantization of the values inarg. The result matrix is of value typeui8. 
Input/Output
DAPHNE supports local file I/O for various file formats. The format is determined by the specified file name extension. Currently, the following formats are supported:
- ".csv": comma-separated values
 - ".mtx": matrix market
 - ".parquet": Apache Parquet format
 - ".dbdf": DAPHNE's binary data format
 
For both reading and writing, file names can be specified as absolute or relative paths.
For most formats, DAPHNE requires additional information on the data and value types as well as dimensions, when reading files.
These must be provided in a separate .meta-file.
- 
print(arg:scalar/matrix/frame[, newline:bool[, toStderr:bool]])Prints the given scalar, matrix, or frame
argtostdout. The parameternewlineis optional;true(the default) means a new line is started afterarg,falsemeans no new line is started. The parametertoStderris optional;truemeans the text is printed tostderr,false(the default) means it is printed tostdout. - 
readFrame(filename:str)Reads the file
filenameinto a frame. Assumes that a.meta-file is present for the specifiedfilename. - 
readMatrix(filename:str)Reads the file
filenameinto a matrix. Assumes that a.meta-file is present for the specifiedfilename. - 
write/writeFrame/writeMatrix(arg:matrix/frame, filename:str)Writes the given matrix or frame
arginto the specified filefilename. Note that the type ofargdetermines how to store the data; thus, it suffices to callwrite()(butwriteFrame()andwriteMatrix()can be used synonymously for consistency with reading). At the same time, this creates a.meta-file for the written file, so that it can be read again usingreadMatrix()/readFrame(). - 
stop([message:str])Terminates the DaphneDSL script execution with the given optional message.
 
Data Preprocessing
- 
oneHot(arg:matrix, info:matrix<si64>)Applies one-hot-encoding to the given (n x m) matrix
arg. The (1 x m) row-matrixinfospecifies the details (in the following, d[j] is short forinfo[0, j]):- If d[j] == -1, then the j-th column of 
argwill remain as it is. - If d[j] == 0, then the j-th column of 
argwill be omitted in the output. - 
If d[j] > 0, then the j-th column of
argwill be encoded to a vector of length d[j].More precisely, if d[j] > 0 the j-th column of
argmust contain only integral values in the range [0, d[j] - 1], and will be replaced by d[j] columns containing only zeros and ones. For each row i inarg, the value in theas.scalar(arg[i, j])-th of those columns is set to 1, while all others are set to 0. 
 - If d[j] == -1, then the j-th column of 
 - 
recode(arg:matrix, orderPreserving:bool)Applies dictionary encoding to the given (n x 1) matrix
arg, i.e., assigns an integral code to each distinct value inarg. The codes start at 0. Iff the parameterorderPreservingistrue, then the distinct values are sorted in ascending order before assigning codes. That way, range predicates are possible on the encoded output. Otherwise, new codes are assigned to distinct values are they are encountered. That way, only point predicates are possible on the encoded output.There are two results:
- The first result is the encoded data, a (n x 1) matrix of the codes for each element in the input 
arg. - The second result is the decoding dictionary, a (#distinct(
arg) x 1) matrix of the distinct values inarg. The value at position i is the value that is mapped to the code i. 
The encoded data can be decoded by right indexing on the dictionary using the codes as positions.
Example:
 - The first result is the encoded data, a (n x 1) matrix of the codes for each element in the input 
 - 
bin(arg:matrix, numBins:size[, min:scalar, max:scalar])Applies binning to the given matrix
arg, i.e., each value is mapped to the number of its bin (counting starts at zero). The bin boundaries are determined by splitting the interval [min,max] intonumBinsequi-width bins. The number of binsnumBinsis required. Specifyingminandmaxis optional; if they are omitted, they are automatically calculated, whereby NaNs inargare not taken into account.Example:
 
Measurements
- 
now()Returns the current time since the epoch in nanoseconds.
 
List Operations
- 
createList(elm:matrix, ...)Creates and returns a new list from the given elements
elm. At least one element must be specified. - 
length(lst:list)Returns the number of elements in the given list
lst. - 
append(lst:list, elm:matrix)Appends the given matrix
elmto the given listlst. Returns the result as a new list (the argument list stays unchanged). - 
remove(lst:list, idx:size)Removes the element at position
idx(counting starts at zero) from the given listlst. Returns (1) the result as a new list (the argument list stays unchanged), and (2) the removed element.