Accepted papers

Show/hide abstracts

Oral presentations

5. James Cussens. Approximate Bayesian computation for the parameters of PRISM programs - Supplementary materials
Abstract: Probabilistic logic programming formalisms permit the definition of potentially very complex probability distributions. This complexity can often make learning hard, even when structure is fixed and learning reduces to parameter estimation. In this paper an approximate Bayesian computation (ABC) method is presented which computes approximations to the posterior distribution over PRISM parameters. The key to ABC approaches is that the likelihood function need not be computed, instead a `distance' between the observed data and synthetic data generated by candidate parameter values is used to drive the learning. This makes ABC highly appropriate for PRISM programs which can have an intractable likelihood function, but from which synthetic data can be readily generated. The algorithm is experimentally shown to work well on an easy problem but further work is required to produce acceptable results on harder ones.
7. Sriraam Natarajan, Tushar Khot, Kristian Kersting, Bernd Gutmann and Jude Shavlik. Boosting Relational Dependency Networks
Abstract: Relational Dependency networks are graphical models that extend dependency networks domains to relational domains where the joint probability distribution over the variables is approximated as a set of conditional distributions. The current learning algorithms for RDNs use pseudolikelihood techniques to learn probability trees at each node to represent the conditional distribution. In this work, we propose the use of gradient tree boosting as applied by Dietterich et al.(2004) to learn a set of regression trees for each predicate. These regression trees aim to optimize the conditional likelihood P(Y|X) of each predicate and the use of several regression trees as against a single probability tree results in an expressive model. Our experiments show that this training method results in efficient learning of RDNs when compared to the traditional methods.
10. Ashwin Srinivasan, Tanveer Faruquie and Sachindra Joshi. Exact Data Parallel Computation for Very Large ILP Datasets - Supplementary materials
Abstract: The emergence of very large machine-generated datasets raises a question of some importance for ILP, namely: can an ILP system construct models efficiently using datasets whose sizes are too large to fit in random access memory? One attempt to address this is simply to reduce the data either by statistical sampling, or by data-partitioning. Both result in inexact values for parameters (like the accuracy of a clause on the data provided). A few implementations have attempted to retain exactness by accessing data directly from external mass storage, either using a relational database, or conventional files distributed over several machines. These have usually required a substantial understanding of either the indexing mechanisms of the relational database management system, or of the software enabling inter-processor communication.
Here we examine the applicability to ILP of a popular distributed computing approach that in principle, allows the analysis of petabytes of data without any special understanding of the underlying hardware or software involved. Specifically, we show how the MapReduce programming model can be used to perform the coverage-test that is at the heart of many ILP systems. Our findings, with synthetic datasets containing examples ranging from few thousands to millions, are these: (a) There are overheads associated with each invocation of a MapReduce implementation. These can be ignored (but not eliminated entirely) either by testing several clauses at once, or if the dataset size is large; (b) Ignoring overheads, the time to obtain the exact coverage for a clause using a MapReduce implementation grows in proportion to the size of the dataset; and (c) If a MapReduce implementation is used as part of an ILP search, then benefits can only be expected above some minimal dataset size. While the precise value of this size will be problem-dependent, from our experiments we conjecture that it may be in the range of 300 to 500 megabytes. We also examine the performance of a MapReduce-enabled ILP system on real problems with data sizes varying from tens to thousands of megabytes, and find the results to be consistent with the conjecture in (c) above.
11. Yoshitaka Yamamoto, Katsumi Inoue and Koji Iwanuma. From inverse entailment to inverse subsumption - Supplementary materials
Abstract: Modern explanatory ILP methods like Progol, Residue procedure, CF-induction, HAIL and Imparo use the principle of Inverse Entailment (IE). It is based on the fact that the negation of a hypothesis is derived from a prior background theory and the negation of examples. IE-based methods commonly compute a hypothesis in two steps: by first constructing an intermediate theory and next by generalizing its negation into the hypothesis with the inverse relation of entailment. In this paper, we focus on the sequence of intermediate theories that constructs a derivation from the background theory and the negation of examples to the negation of the hypothesis. We then show the negations of those derived theories in a sequence are represented with inverse subsumption. Using our result, inverse entailment can be reduced into inverse subsumption, while it preserves the completeness for finding hypotheses.
12. Bernd Gutmann, Manfred Jaeger and Luc De Raedt. Extending ProbLog with Continuous Distributions
Abstract: ProbLog is a recently introduced probabilistic extension of Prolog. The key contribution of this paper is that we extend ProbLog with abilities to specify continuous distributions and that we show how ProbLog's exact inference mechanism can be extended to cope with such distributions. The resulting inference engine combines an interval calculus with a dynamic discretization algorithm into an effective solver.
14. Michelangelo Diligenti, Marco Gori, Marco Maggini and Leonardo Rigutini. Multitask Kernel-based Learning with First-Order Logic Constraints
Abstract: In this paper we propose a general framework to integrate supervised and unsupervised examples with background knowledge expressed by a collection of first-order logic clauses into kernel machines. In particular, we consider a multi-task learning scheme where multiple predicates defined on a set of objects are to be jointly learned from examples, enforcing a set of FOL constraints on the admissible configurations of their values. The predicates are defined on the feature spaces, in which the input objects are represented, and can be either known a priori or approximated by an appropriate kernel-based learner.A general approach is presented to convert the FOL clauses into a continuous implementation that can deal with the outputs computed by the kernel-based predicates. The learning problem is formulated as a semi-supervised task that requires the optimization in the primal of a loss function that combines a fitting loss measure on the supervised examples, a regularization term, and a penalty term that enforces the constraints on both the supervised and unsupervised examples. Unfortunately, the penalty term is not convex and it can hinder the optimization process. However, it is possible to avoid poor solutions by using a two stage learning schema, in which the supervised examples are learned first and then the constraints are enforced.
16. Fabrizio Riguzzi and Nicola Di Mauro. Applying the Information Bottleneck Approach to SRL: Learning LPAD Parameters
Abstract: In this paper, we propose to apply the Information Bottleneck (IB) approach to a sub-class of Statistical Relational Learning (SRL) languages. Learning parameters in SRL dealing with domains that involve hidden variables requires the use of a technique for learning from incomplete data such as the \textit{expectation maximization} (EM) algorithm. Recently, it has been shown that the IB approach overcomes well known problems of the EM algorithm. Here we show that learning in all the SRL languages reducible to Bayesian Networks can be obtained by applying the IB approach. In particular, in this paper our focus is on the problem of learning the parameters of Logic Programs with Annotated Disjunction (LPADs). We adopt a reductionist approach in which an acyclic LPAD is translated into a Bayesian network (such that the LPAD theory parameters appear in the network's CPDs). The reduction process introduces in the network some hidden variables thus naturally requiring the use of the IB approach. The paper illustrates the adaptation of the IB approach to the problem of interest and shows some experimental results on natural and artificial datasets.
18. Radomir Cernoch and Filip Zelezny. Speeding up Planning through Minimal Generalizations of Partially Ordered Plans
Abstract: We present a novel strategy enabling to exploit existing plans in solving new similar planning tasks by finding a common generalized core of the existing plans. For this purpose we develop an operator yielding a minimal joint generalization of two partially ordered plans. In three planning domains we show a substantial speed-up of planning achieved when the planner starts its search space exploration from the learned common generalized core, rather than from scratch.
24. Trevor Walker, Sriraam Natarajan, Gautam Kunapuli, Jude Shavlik and David Page. Automation of ILP Setup and Search via User Provided Relevance and Type Information
Abstract: Inductive Logic Programming (ILP) provides an effective method of learning logical theories given a set of positive examples, a set of negative examples, a corpus of background knowledge and specification of a search space (e.g., by mode definitions) from which to compose the theories. While specifying positive and negative examples is relatively straightforward, composing effective background knowledge and search space definition requires detailed understanding of many aspects of the ILP process and limits the usability of ILP. This paper introduces a number of techniques to automate the use of ILP. These techniques include automatic generation of background knowledge from user-supplied information in the form of a simple relevance language, utilization of type hierarchies to constrain search, automatic generation of negative examples, and an iterative-deepening style search process. We provide an example domain where we utilize these techniques to completely automate a variety of ILP tasks.
25. Stephen Muggleton, Jianzhong Chen, Hiroaki Watanabe, Stuart Dunbar, Charles Baxter, Richard Currie, Jose Domingo Salazar, Jan Taubert and Michael Sternberg. Variation of background knowledge in an industrial application of ILP
Abstract: In several recent papers ILP has been applied to Systems Biology problems, in which it has been used to fill gaps in the descriptions of biological networks. In the present paper we describe two new applications of this type in the area of plant biology. These applications are of particular interest to the agrochemical industry in which improvements in plant strains can have benefits for modelling crop development. The background knowledge in these applications is extensive and is derived from public databases in a Prolog format using a new system ONDEX (developers Rothamsted Research). In this paper we explore the question of how much of this background knowledge it is beneficial to include, taking into account accuracy increases versus increases in learning time. The results indicate that relatively shallow background knowledge is needed to achieve maximum accuracy.
26. David Vaz, Vitor Santos Costa and Michel Ferreira. Fire! Firing Inductive Rules from Economic Geography for Fire Risk Detection
Abstract: Wildfires can importantly affect the ecology and economy of large regions of the world. Effective prevention techniques are fundamental to mitigate their consequences. The design of such preemptive methods requires a deep understanding of the factors that increase the risk of fire, particularly when we can intervene on these factors. This is the case for the maintenance of ecological balances in the landscape that minimize the occurrence of wildfires. We use an inductive logic programming approach over detailed spatial datasets: one describing the landscape mosaic and characterizing it in terms of its use; and another describing polygonal areas where wildfires took place over several years. Our inductive process operates over a logic term representation of vectorial geographic data and uses spatial predicates to explore the search space, leveraging the framework of Spatial-Yap, its multi-dimensional indexing and tabling extensions. We show that the coupling of a logic-based spatial database with an inductive logic programming engine provides an elegant and powerful approach to spatial data mining.
28. Alireza Tamaddoni-Nezhad and Stephen Muggleton. Stochastic Refinement
Abstract: Most ILP systems are traditionally based on clause refinement through a lattice defined by a generality order (e.g. subsumption). However, there is a long-standing and increasing interest in stochastic search methods in ILP. The research presented in this paper is motivated by the following question. How can the generality order of clauses and the relevant concepts such as refinement be adapted to be used in a stochastic search? To address this question we introduce the concept of stochastic refinement operators and adapt a framework, called stochastic refinement search. We give an analysis of the stochastic refinement operators within this framework and also use this framework to characterise a special case where a stochastic search is used to explore a refinement graph bounded by a most specific (bottom) clause.
29. Yi Huang, Volker Tresp , Markus Bundschus and Achim Rettinger. Multivariate Structured Prediction for Learning on Semantic Web - Supplementary materials
Abstract: One of the main characteristics of Semantic Web (SW) data is that it is notoriously incomplete: in the same domain a great deal might be known for some entities and almost nothing might be known for others.  A popular example is the well known friend-of-a-friend data set where, for privacy concerns and other reasons, some members document exhaustive private and social information whereas almost nothing is known for other members. Although deductive reasoning can be used to complement factual knowledge based on the ontological background, still a tremendous number of potential statements remain to be uncovered. The paper is focused on the prediction of potential relationships and attributes by exploiting regularities in the data using statistical relational learning algorithms.  We define an extension of the Semantic Web query language SPARQL, which allows the integration of the learned probabilistic statements into querying. Statements that can be inferred via logical reasoning can readily be integrated into learning and querying. We argue that multivariate structured prediction approaches are most suitable for dealing with the resulting high-dimensional sparse data matrix. Within the statistical framework, the approach scales up to large domains and is able to deal with highly sparse relationship data.  A major goal of the presented work is to formulate an inductive learning approach that can be used by people with little machine learning background. We present experimental results using a friend-of-a-friend data set.
31. Luc De Raedt and Ingo Thon. Probabilistic Rule Learning
Abstract: Traditionally, rule learners have learned deterministic rules from deterministic data, that is, the rules have been expressed as logical statements and also the examples and their classification have been purely logical. We introduce a novel probabilistic setting for rule learning in which both the examples themselves as well as their classification can be probabilitistic.  This is a natural extension of traditional rule learning and it is shown how many well-known concepts from the rule learning literature can be upgraded towards the use of probabilities. The setting is incorporated in the probabilistic rule learner ProbFOIL, which combines the principles of the relational rule learner FOIL with the probababilistic Prolog, ProbLog. Finally, we report on some experiments that demonstrate the utility of the approach.
41. Matthieu Lopez, Lionel Martin and Christel Vrain. Learning discriminant rules as a minimal saturation search - Supplementary materials
Abstract: It is well known that for certain relational learning problems, traditional top-down search falls into blind search. Recent works in Inductive Logic Programming about phase transition and crossing plateau show that no general solution can face to all these difficulties. In this context, we introduce the notion of "minimal saturation" to build non-blind refinments of hypotheses in a bottom-up approach. We present experimental results of this approach on some benchmarks inspired by constraint satisfaction problems. These problems can be specified in first order logic but most existing ILP systems fail to learn a correct definition, especially because they fall into blind search.
43. Katsumi Inoue, Andrei Doncescu and Hidetomo Nabeshima. Hypothesizing about Networks in Meta-level Abduction
Abstract: Meta-level abduction has been proposed to discover missing links and unknown nodes from incomplete network data to account for observations. In this work, we extend applicability of meta-level abduction to deal with networks containing both positive and negative causal effects. Such networks are often used in many domains including biology, where inhibitory effects are important in signaling and metabolic pathways. We show that meta-level abduction can consistently produce both positive and negative causal relations as well as invented nodes. As a case study, we show an application of meta-level abduction to a p53 signal network by abducing causal rules to explain how a tumor suppressor works.

Posters

4. Jose Santos and Stephen Muggleton. When does it payoff to use sophisticated entailment engines in ILP?
Abstract: Entailment is an important problem in computational logic particularly relevant to the Inductive Logic Programming (ILP) community as it is at the core of the hypothesis coverage test which is often the bottleneck of an ILP system. Despite recent developments in subsumption engines, most ILP systems still use SLD-resolution to perform the hypothesis coverage test. In this paper we present three alternative entailment engines, fully integrated in the ILP system ProGolem, and compare their performance on a representative set of ILP problems.
9. Laura-Andreea Antanas, Martijn van Otterlo, Jose Oramas M., Tinne Tuytelaars and Luc De Raedt. Not Far Away from Home: a relational distance-based approach to understand images of houses - Supplementary materials
Abstract: Recently there has been a growing interest in augmenting vision systems towards incorporating to a greater extent high-level knowledge and reasoning. The goal is to improve the lower level vision processes, such as object detection, but also to be able to deal with richer and more structured video information. In this paper we tackle the problem of delimiting conceptual elements of street views based on spatial relations between lower level components e.g. the element house is composed of windows and a door in a specific spatial arrangement. We propose a hierarchical approach, by grouping elements at different levels in the image into gradually higher level concepts. The context is that of structured data: each concept can be seen as a graph defining spatial relations between its components e.g. left, up, close, or a spatial embedding. We employ distances between logical interpretations to match parts of images with known examples and to frame the solution at each level.
13. Christophe Rodrigues, Pierre Gerard and Celine Rouveirol. Incremental learning of relational action models in noisy environments
Abstract: In the Relational Reinforcement Learning framework, we propose an algorithm to learn an action model (or approximation of the transition function) permitting to anticipate the resulting state of an action in a given situation. This algorithm learns incrementally a set of first order rules in a noisy environment following a data-driven loop. Each time a new example is presented that contradicts the current action model, the model is revised (generalization and/or spe- cialization). As opposed to a previous version of our algorithm that operates in a noise-free context, we introduce here a number of indicators attached to each rule that allows to evaluate if the revision should take place immediately or be delayed. We provide an empirical evaluation on usual RRL benchmarks.
17. Stefano Bragaglia and Fabrizio Riguzzi. Approximate Inference for Logic Programs with Annotated Disjunctions - Supplementary materials
Abstract: Logic Programs with Annotated Disjunctions (LPADs) are particularly interesting for Probabilistic Inductive Logic Programming because they have a sound semantics, an intuitive syntax and they allow to exploit many of the techniques developed in Logic Programming for probabilistic reasoning. Recently, various works have started to investigate the problem of learning such a language, with good experimental results. In order to develop efficient learning systems, it is fundamental to have high-performing inference algorithms. Two approaches have been proposed for exact reasoning on LPADs: cplint uses BDDs as ProbLog and CVE converts the LPAD to an equivalent Bayesian network. In this paper we adapt to LPAD the approaches for approximate inference that have been developed for ProbLog, namely k-Best and Monte Carlo. k-Best finds a lower bound by identifying the k most probable explanations while Monte Carlo computes probabilities by smartly sampling the space of programs. The two techniques have been implemented in the cplint suite and have been tested on real and artificial datasets representing graphs. The results show that both algorithms are able to solve larger problems often in less time than exact inference.
21. Beatriz Garcia Jimenez, Agapito Ledezma and Araceli Sanchis. MMRF for Proteome Annotation Applied to Human Protein Disease Prediction - Supplementary materials
Abstract: Biological tasks where every gene and protein participates is an essential knowledge for designing disease treatments. Nowadays, these annotations are still unknown for many genes and proteins. Since making annotations from in-vivo experiments is costly, computational predictors are needed for different kinds of annotation such as metabolic pathway, interaction network, protein family, tissue, disease and so on. Biological data has an intrinsic relational structure, including genes and proteins, which can be grouped by many criteria. This hinders the possibility of finding good hypotheses when attribute-value representation is used. Hence, we propose the generic Modular Multi-Relational Framework (MMRF) to predict different kinds of gene and protein annotation using Relational Data Mining (RDM). The specific MMRF application to annotate human protein with diseases verifies that group knowledge (mainly protein-protein interaction pairs) improves the prediction, almost duplicating precision-recall results.
22. Andrej Oblak and Ivan Bratko. Learning from noisy data with a non-covering ILP algorithm
Abstract: In this paper we describe the non-covering inductive logic programming program HYPER/N, concentrating mainly on noise handling as well as some other mechanisms that improve learning. We perform some experiments with HYPER/N on synthetic weather data with artificially added noise, and on real weather data to learn to predict the movement of rain from radar rain images and synoptic data.
30. Yusuke Nakano and Nobuhiro Inuzuka. Multi-Relational Pattern Mining Based-on Combination of Properties with Preserving Their Structure in Examples - Supplementary materials
Abstract: We propose a new algorithm for the problem of multi-relational pattern mining through the problem established in WARMR. In order to overcome the combinatorial problem of large pattern space, another algorithm MAPIX restricts patterns into combination of basic patterns, called properties. A property is defined as a set of literals appeared in examples and is an extended form of the attribute-value form. MAPIX enumerates patterns made from conjunction of the properties. Although the range of patterns is clear and MAPIX enumerates them efficiently, a large part of patterns are out of the range. Advantage of MAPIX is to make patterns from pattern fragments occurred in examples. Many patterns which are not appeared in examples are not tested. The proposing algorithm keeps this advantages and extends the way of combination of properties. The algorithm adopts a way of combination it combines properties as they appeared in examples, we call it structure preserving combination. We give a simple mining procedure which uses the MAPIX algorithm twice. In the first MAPIX process it enumerates frequent properties and their frequent conjunctions. Before the second MAPIX process it makes structure preserving combinations from the outputs of the first step, and they are registered as molecular patterns. Then the second MAPIX process produces all frequent conjunctions of the molecular patterns. The patterns produced do not include duplication in the sense of logical equivalence.
33. Srihari Kalgi, Chirag Gosar, Prasad Gawde, Ganesh Ramakrishnan, Chander Iyer, Kiran T V S, Kekin Gada and Ashwin Srinivasan. BET: An ILP workbench - Supplementary materials
Abstract: There have been several ILP system implementations. However, each has its specific input and output specifications, and different systems do not necessarily agree on their inputs and outputs. This can often make comparisons across implementations tricky, owing to either a difference in names or semantics of parameters and/or a difference in the choice of the programming language. This paper discusses a Workbench for ILP (BET) in Java which standardizes the specification of input and output (using XML). BET stands for Background + Examples = Theories. It provides a common framework for building as well as integrating several ILP systems. The provision for implementing algorithms in a commong language (Java) improves the feasibility of comparing algorithms on their computational speeds. Whereas, the input/output standardization enables sound comparison of accuracies (or related measures).
There are several other motivations for developing BET, a principal one being the reduction of the learning curve for both end-users and programmers in the area of relational learning. For example, with BET, the end-user needs to understand only the standardized input parameters for BET. This reduces the time overheads involved in comparing different algorithmic implementations as well as in experimenting with individual implementations, especially in application settings. Further, we standardized the APIs (through abstract Java classes and interfaces), and this makes development of algorithms within BET much easier.
Finally, we have implemented a suite of wrappers around the theorem provers of YAP, SWI-PROLOG and PROGOL in the basic BET package. Several evaluation functions and operator primitives such as LGG, Upword cover, Downward cover are also a part of the basic package. These features, jointly enable the rapid development of new relational learning systems. At the time of writing this abstract, FOIL, GOLEM and TILDE have been implemented while existing implementations of FOIL, GOLEM and PRISM have been integrated into BET.
34. Naveen Nair, Chander Jayaraman, Kiran TVS and Ganesh Ramakrishnan. Pruning Search Space for Weighted First Order Horn Clause Satisfiability - Supplementary materials
Abstract: Many SRL models pose logical inference as weighted satisfiability solving. Performing logical inference after completely grounding clauses with all possible constants is computationally expensive and approaches such as LazySAT utilize the sparseness of the domain to deal with this. Here, we investigate the efficiency of restricting the Knowledge Base to the set of first order horn clauses. We propose an algorithm that prunes the search space for satisfiability in horn clauses and prove that the optimal solution is guaranteed to exist in the pruned space. The approach finds a model, if it exists, in polynomial time; otherwise it finds an interpretation that is most likely given the weights. We provide experimental evidence that our approach reduces the size of search space substantially.
36. Niels Pahlavi and Stephen Muggleton. Can HOLL outperform FOLL? - Supplementary materials
Abstract: Learning first-order recursive theories remains a difficult learning task in a normal Inductive Logic Programming (ILP) setting, although numerous approaches have adressed it; using Higher-order Logic (HOL) avoids having to learn recursive clauses for such a task. It is one of the areas where Higher-order Logic Learning (HOLL), which uses the power of expressivity of HOL, can be expected to improve the learnability of a problem compared to First-order Logic Learning (FOLL). We present a first working implementation of λProgol, a HOLL system adapting the ILP system Progol and the HOL formalism λProlog, which was introduced in a poster last year. We demonstrate that λProgol outperforms standard Progol when learning first-order recursive theories, by improving significantly the predictive accuracy of several worked examples, especially when the learning examples are small with respect to the size of the data.
37. Alberto Illobre, Jorge Gonzalez, Ramon Otero and Jose Santos. Learning action descriptions of opponent behaviour in the Robocup 2D simulation environment.
Abstract: The Robocup 2D simulation competition proposes a dynamic environment where two opponent teams are confronted in a simplified soccer game. As of 2009, all major teams use a fixed algorithm to control its players. An unexpected opponent strategy, not previously considered by the developers, might result in winning all matches. The impossibility to adapt to new strategies is a recurring problem in competitive games. We solve this problem by using ILP to learn action descriptions of opponent players. These descriptions can then be used to plan for desired states under suitable non-monotonic formalisms. We use a simplified scenario where we learn the behaviour of a goalkeeper based on the actions of a shooter player. The resulting description is used to plan for states where the probability of scoring a goal is maximized. This result can directly be extended to a multiplayer environment. For learning on dynamic domains, we have to deal with the frame problem. We use the sound and complete method described on [7] for efficiently learning under these conditions. The results show that this method can find detailed action descriptions of the behaviour of opponent players. These descriptions can be used to dynamically modify the behaviour of the controlled team according to the opponent's strategies.
39. Erick Alphonse, Tobias Girschick, Fabian Buchwald and Stefan Kramer. A Numerical Refinement Operator based on Multi-Instance Learning
Abstract: We present a numerical refinement operator based on multi-instance learning. In the approach, the task of handling numerical variables in a clause is delegated to statistical multi-instance learning schemes. To each clause, there is an associated multi-instance classification model with the numerical variables of the clause as input. Clauses are built in a greedy manner, where each refinement adds new numerical variables which are used additionally to the numerical variables already known to the multi-instance model. In our experiments, we tested this approach with multi-instance support vector machines (MI-SVMs). Refinement stops if the margin of the MI-SVM does not improve anymore. The approach is evaluated on the problem of hexose binding site prediction and two pharmacological applications. In all three applications, the task is to find configurations of points with certain properties in 3D space that characterize either a binding site or drug activity: the logical part of the clause constitutes the points with their properties, whereas the multi-instance model considers the distances among the points. In summary, the new numerical refinement operator is interesting both theoretically as a new synthesis of logical and statistical learning and practically as a new method for characterizing binding sites and pharmacophores in biochemical applications.
40. Tarek Abudawood and Peter Flach. Learning Multi-class Theories in ILP - Supplementary materials
Abstract: In this paper we investigate the lack of reliability and consistency of those binary rule learners in ILP that employ the one-vs-rest binarisation technique when dealing with multi-class domains. We show that we can learn a simple, consistent and reliable multi-class theory by combining the rules of the multiple one-vs-rest theories into one rule list or set. We experimentally show that our proposed methods produce coherent and accurate rule models from the rules learned by Aleph.
46. Ondrej Kuzelka and Filip Zelezny. Seeing the World through Homomorphism: An Experimental Study on Reducibility of Examples
Abstract: We study reducibility of examples in several typical inductive logic programming benchmarks. The notion of reducibility that we use is related to theta-reduction, commonly used to reduce hypotheses in ILP. Whereas examples are usually not reducible on their own, they often become implicitly reducible when language for constructing hypotheses is fixed. We show that the number of ground facts in a dataset can be almost halved for some real-world molecular datasets. Furthermore, we study the impact this has on a popular ILP system Aleph.
55. Max Pereira, Nuno A. Fonseca, Vitor Santos Costa and Rui Camacho. Interactive Discriminative Mining of Chemical Fragments
Abstract: Structural activity prediction is one of the most important tasks in chemoinformatics. The goal is to predict a property of interest given structural data on a set of small compounds or drugs. Ideally, systems that address this task should not just be accurate; they should be able to identify an interpretable discriminative structure which describes the most discriminant structural elements with respect to some target.
The application of ILP in a software for discriminative interactive mining of chemical fragments is presented in this paper. In particular, it is described the coupling of an ILP system with a molecule visualisation software that allows a chemist to graphically control the search for interesting patterns in chemical fragments. Furthermore, we show how structured information, such as rings, functional groups like carboxyl, amine, methyl, ester, etc is integrated and exploited in the search.