Skip to content

Inductive Logic Programming (ILP) Grammar and Linter

This defines a simple grammar (cmd/ILPLang.g4) and a command-line tool which can be used to lint for problems in dataset formatting.

GitHub release (latest by date) GitHub Test Parsing

Overview

The target is a linter binary to help point out issues when tokenizing or parsing a dataset.

Example 1: No Errors

When the dataset is well-formatted, nothing is returned:

smokes(person1).
friends(person1,person2).
friends(person2,person1).
./linter -tokens -file=examples/pos/pos1.txt
./linter -file=examples/pos/pos1.txt
# (No output for either case)

Example 2: Bad Data

When there is something in the data that cannot be recognized, problems are directed to stderr:

friends(person1,person2).
Bad Data.
./linter -tokens -file=examples/neg/neg1.txt
line 2:0 token recognition error at: 'B'
line 2:3 token recognition error at: ' '
line 2:4 token recognition error at: 'D'
./linter -file=examples/neg/neg1.txt
line 2:0 token recognition error at: 'B'
line 2:3 token recognition error at: ' '
line 2:4 token recognition error at: 'D'
line 2:5 missing '(' at 'ata'
line 2:8 mismatched input '.' expecting {')', ','}

Example 3: Regression Examples

The parser can also look for regressionExample values, used in regression data sets.

The parser will not check whether an entire dataset is correct (regressionExample in labeled as positive, empty negative examples, and facts). But this could be accomplished fairly easily elsewhere.

regressionExample(medv(id100),33.2).
regressionExample(medv(id101),27.5).
regressionExample(medv(id10),18.9).
regressionExample(medv(id102),26.5).

Usage

Download a Binary

Precompiled binaries are listed on the GitHub Releases page, and the latest version can be downloaded with these links:

Platform Link
Linux/amd64 Download
macOS/amd64 Download
Windows/amd64 Download

Build from Source

Building requires a Go compiler.

cd cmd
go build

A copy of the generated ANTLR parser files are committed to the repository, and rebuilding them requires an ANTLR Parser Generator.

make clean
make linter

Limitations

This grammar is extremely conservative currently: the only tokens allowed are lowercase characters, integers, and underscores.

a(x_1,y_1).
b(x_1).

Contributions

Some ideas were taken from the FOPC_MLN_ILP_Parser developed by Jude Shavlik and Trevor Walker (and possibly contributed to by many others who went unnamed in the source code). There are a few versions of their Tokenizers (StreamTokenizerJWS and StreamTokenizerTAW) and Parser currently used in other projects.


Last update: August 5, 2021
Back to top