Command Line Arguments

Various arguments can be specified at the command line to execute ML-Flex in custom ways. Two of these arguments are mandatory, while the remaining arguments are optional. Below are a few examples of how these parameters can be specified in executing ML-Flex. (Please note also that an -Xmx argument is passed to Java. This argument allows the user to increase the amount of memory that is available to Java, which can be crucial in processing large data sets.)

Below is a list of all command-line arguments, along with examples of how to use them.

Mandatory command-line arguments

EXPERIMENT_FILEThis setting requires the user to specify a path to an experiment file. The name of this file will be used as the experiment name, and the file should contain settings for an ML-Flex experiment (see Creating an Experiment File).
ACTIONThis setting requires the user to specify one of actions that can be performed when ML-Flex executes:
  • Process = The specified experiment will be executed. This action can be executed across multiple computers in parallel, as long as they all have access to the same file store. ML-Flex will automatically split the work across any computers currently executing an experiment.
  • Reset = All files that were previously generated by ML-Flex for the specified experiment will be deleted. This task can execute in parallel on a single computer, which speeds up the process of deleting files substantially. Obviously, it does not make sense to perform a Reset on one computer while the Process action is executing on another computer for the same experiment.

Optional command-line arguments

DEBUG As ML-Flex executes, it outputs logging information to standard out and to a Log.txt file. Additional logging information can be output when DEBUG is set to true. This can be useful when an error has occurred to aid in troubleshooting. By default, debugging is turned off to avoid computational and storage overhead. false
NUM_THREADS ML-Flex uses the Java threading capability to execute computing tasks in parallel. With this setting, the user can specify the maximum number of threads per computing node that can be used by ML-Flex. If ML-Flex seems to be running slowly on large data sets, it may be that this value is too high. The number of processors available to the Java virtual machine on the computer on which ML-Flex is executed.
THREAD_TIMEOUT_MINUTES ML-Flex uses the Java threading capability to execute computing tasks in parallel. For a variety of reasons, a thread may "hang" and not return a result. Thus it may be desirable to specify a timeout period after which ML-Flex will abandon a thread and retry executing the task. It is recommended that this setting be longer than the longest time that any given feature selection or classification task is expected to take. 60
PAUSE_SECONDS When ML-Flex attempts to execute tasks across multiple computing nodes, it may identify a situation where a processing task remains to be performed and it appears that another thread is currently executing that task. In most cases, this is truly because the task is being executed, so the current thread will pause for a short time and wait to see if the other thread has completed processing. If so, the current thread will move on to the next set of tasks. Otherwise, the current thread will pause again, and this process will repeat until the thread timeout has occurred (after which the corresponding lock file will be deleted and the current thread will attempt to execute the task). The PAUSE_SECONDS configuration value specifies the number of seconds that each pause will last. 60
EXPORT_DATA This setting accepts either "true" or "false" as a value. When set to true, the data that have been processed by ML-Flex will be exported to multiple formats (currently, tab-delimited and ARFF) to enable the user to perform downstream analyses if desired. false
MAIN_DIRECTORY It is possible to store the ML-Flex executable files in one location and the data files in a different location. This setting allows you to specify where the data files are stored. By default, the executable files are stored in the same location as the data files. This setting will likely be used rarely. Same location as the executable files.
LEARNER_TEMPLATES_FILE This file is used to store information about how ML-Flex can interface with third-party machine-learning packages. By default, this file is located at Config/Learner_Templates.txt file. However, an alternative file can be specified using this parameter. Config/Learner_Templates.txt
CLASSIFICATION_ALGORITHMS_FILE By default, classification algorithms are configured in the Config/Classification_Algorithms.txt file. However, an alternative file can be specified using this parameter. Config/Classification_Algorithms.txt
FEATURE_SELECTION_ALGORITHMS_FILE By default, feature-selection algorithms are configured in the Config/Feature_Selection_Algorithms.txt file. However, an alternative file can be specified using this parameter. Config/Feature_Selection_Algorithms.txt

Table of Contents

Introduction to ML-Flex


Configuring Algorithms

Creating an Experiment File

List of Experiment Settings

Running an Experiment

List of Command-line Arguments

Executing Experiments Across Multiple Computers

Modifying Java Source Code

Creating a New Data Processor

Third-party Machine Learning Software

Integrating with Third-party Machine Learning Software

About Ensemble Learners