Data Mining Server, Random Rules documentation

General information about Random Rules server

The main task of this server is induction of a predictive model for a given set of training examples. Input is a data file with training examples and output is a report with information about the complexity of the constructed model (number of generated rules) and information about expected accuracy of the model on unseen examples.

Optionally you may prepare also a data file with test examples for which you want to obtain predictions or on which you want to test the quality of the constructed model. The number of attributes and their order must be identical in the training and test data files. When both training and test files are uploaded then the report will include also information about actual accuracy measured on the test set. And there will be a link to the file with given and predicted classes for all test examples. Even if real classifications of test examples are not known, the test data file must contain the column with class names.

Optionally you may prepare and upload a file with names of attributes. The number of names of attributes should be identical to the number of columns in the prepared data and test files. If no attribute file is prepared then the classes of examples must be in the last column. You can have classes in any column by setting the starting character of the corresponding attribute name in the attribute file to '!'. Additionally, you may exclude any attribute from being used in the induction by putting '?' as its first character. Example of an attribute file with this options: meningitis-attributes_msp .

Additionally there is a possibility to extend the report with information about most relevant attributes, potential outliers in the training set, and one example prototype per class. Read more about this option.

Preparation of a data files in the appropriate form is the most critical part of using this service. Please read the instructions very carefully. For practical reasons the server can accept training and test sets with up to 5000 examples, 500 attributes, and 20 different classes.

The name of the data file on your computer is your choice. During execution on the server, the data is saved in a file with an internally generated name. Immediately after the rule generation, all files connected with this data will be removed from the server. If another experiment with the same data is needed, the data file must be resubmitted.

Security information

The system will not record any user data but it will also not include any special security properties. Theoretically, the user has no guarantee that his/her data will not be read and stored by system or perhaps even by other users of the server. In cases when this fact may be the problem for the user, it is his/her responsibility to code learning examples so that it is not possible to reconstruct the important private data. Generally, this is not a difficult task.