We introduce some of the datasts that we have used for evaluating our work here. Find the SQL dump of datasets [here]. Find the db file of datasets [here]
University Database
We manually created a small dataset, based on the following schema . The dataset is small and is used as a toy example for testing purposes. The entity tables contain 38 students, 10 courses, and 6 Professors. The Registered table has 92 rows and the RA table has 25 rows.
Using the Functor Bayes Nets package, we found the correlations between attributes in this dataset.
MovieLens Database
The MovieLens dataset is from the UC Irvine machine learning repository. It contains two entity tables: User with 941 tuples and Item with 1,682 tuples, and one relationship table Rated with 80,000 ratings. The User table has 3 descriptive attributes age; gender; occupation. We discretized the attribute age into three bins with equal frequency. The table Item represents information about the movies. It has 17 Boolean attributes that indicate the genres of a given movie. We performed a preliminary data analysis and omitted genres that have only weak correlations with the rating or user attributes, leaving a total of three genres.
Using the Functor Bayes Nets package, we found the correlations between attributes in this database.
Mutagenesis Database
This dataset is widely used in ILP research. Mutagenesis has two entity tables, Atom with 3 descriptive attributes, and Mole, with 5 descriptive attributes, including two attributes that are discretized into ten values each (logp and lumo). It features two relationships MoleAtom indicating which atoms are parts of which molecules, and Bond which relates two atoms and has 1 descriptive attribute.
Using the Functor Bayes Nets package, we found the correlations between attributes in this database.
Hepatitis Database
This data is a modified version of the PKDD02 Discovery Challenge database which includes removing tests with null values. It contains data on the laboratory examinations of hepatitis B and C infected patients. The examinations were realized between 1982 and 2001 on 771 patients. The data are organized in 7 tables (4 entity tables, 3 relationship tables and 16 descriptive attributes). They contain basic information about the patients, results of biopsy, information on interferon therapy, results of out-hospital examinations, results of in-hospital examinations.
Using the Functor Bayes Nets package, we found the correlations between attributes in this database.
Find the SQL dump of datasets [here]
Find the db file of datasets [here]