API Docs
load
The main function for loading a dataset is load
, which returns a train and test fold for a RelationalDataset
type.
RelationalDatasets.load
— Functionload(name::String, version::Union{String, Nothing} = nothing; fold::Int64 = 1)
Load the training and test folds for a dataset.
Convert propositional->relational
Many standard machine learning tasks are built around predicting a vector of outcomes $y$ from a data matrix $X$.
Here we include methods for converting data like these into an Inductive Logic Programming or relational representation.
from_vector
This assumes that the machine learning task can be inferred from the types of $y$: if $y$ is composed of discrete integers we are in a classification task, if $y$ is composed of continuous floats then we are in a regression task.
RelationalDatasets.from_vector
— Functionfrom_vector(X::Matrix{Int}, y::Vector{Int}, names::Union{Vector{String}, Nothing} = nothing)
Convert a classification dataset to an ILP representation.
from_vector(X::Matrix{Int}, y::Vector{Float64}, names::Union{Vector{String}, Nothing} = nothing)
Convert a regression dataset to an ILP representation.
Demo for converting a classification problem:
data, modes = RelationalDatasets.from_vector(
[[0, 1, 1] [1, 0, 2] [2, 2, 0]],
[0, 0, 1],
)
data.pos
1-element Vector{String}:
"v4(id3)."
Regression is similar:
data, modes = RelationalDatasets.from_vector(
[[0, 1, 1] [1, 0, 2] [2, 2, 0]],
[1.1, 1.2, 1.3],
)
data.pos
3-element Vector{String}:
"regressionExample(v4(id1),1.1)."
"regressionExample(v4(id2),1.2)."
"regressionExample(v4(id3),1.3)."
Custom names can also be passed to help make variables more interpretable. Below a small example based on the Boston Housing dataset.
The first two names are covariates and the last ("medv") is the dependent variable:
data, modes = RelationalDatasets.from_vector(
[[1, 1] [1, 2] [2, 1]],
[33.2, 27.5, 18.9],
["age", "dis", "medv"],
)
data.facts
6-element Vector{String}:
"age(id1,age_1)."
"age(id2,age_1)."
"dis(id1,dis_1)."
"dis(id2,dis_2)."
"medv(id1,medv_2)."
"medv(id2,medv_1)."
Constants
DATASETS
RelationalDatasets.DATASETS
— ConstantAvailable datasets from the srlearn/datasets
repository.
RelationalDatasets.DATASETS
12-element Vector{String}:
"toy_cancer"
"toy_father"
"citeseer"
"cora"
"uwcse"
"webkb"
"financial_nlp_small"
"nell_sports"
"icml"
"boston_housing"
"drug_interactions"
"toy_machines"
LATEST_VERSION
RelationalDatasets.LATEST_VERSION
— ConstantDefault download version.
If a "version" parameter is not passed to load
, a dataset of this version is downloaded by default.
RelationalDatasets.LATEST_VERSION
"v0.0.5"