Data preparation for Random Rules induction
Examples are described by a set of attribute values and a class value.
Each example is presented in one row of the data file.
By default class value is the last value in each row.
This may be changed by preparing the file with the names of attributes in
which the name starting with the '!' denotes the target attribute.
Attribute values may be nominal and numerical. Examples of valid nominal values are: A, val,
and ha_12. Examples of valid numerical values are: 7, -3.1, and -3333.22.
Class values and names of attributes must be of type nominal. Maximal length for
nominal values is 20 characters.
Attribute values may be unknown and they must be explicitly stated by some
string whose first character is '?'. Both numerical and nominal attributes may
include unknown values.
All examples must have the same number of attributes as defined by the number of
values in the first row of the data file. It means that a formally correct
data file will contain N rows with A values, where N is the number of examples
and A is the number of attributes including the class attribute. For this server
maximal value for N is 10000 and maximal value for A is 1000. Maximal size for
training and test data files is 2Mb.
A test data file must contain the same attributes in the same order as the
corresponding training data file.
All values of an attribute must be of the same type, either nominal or numeric.
Attribute names and values must be separated my delimiters. Valid delimiters are
comma, semi-colon, and one or more spaces. These characters (',', ';', space, and TAB) may not be
used within attribute names and values. If, for example, an input value consists
of two strings separated by a space then the server will interpret this as two nominal
values and the row will have more values than expected. In this situation the server
immediately stops with data processing and reports an error.
Such situation represents a most often cause of problems with this server.
We have prepared several examples of data files which may be used to see how
correct input data files should look like. Also you may download these files to your computer
and then send them to the server in order to test its functionality.
iris_space_delimited
iris_comma_delimited
iris-attributes
meningitis_space_delimited
meningitis-attributes
© 2013 LIS - Rudjer Boskovic Institute
Last modified: October 15 2015 13:13:50.