Introduction
A.I. Solver Studio is a unique pattern recognition application that deals with finding optimal solutions to classification problems and uses several powerful and proven artificial intelligence techniques including neural networks, genetic programming and genetic algorithms. No special knowledge is required of users as A.I. Solver Studio manages all the complexities of the problem solving internally.
Before you start
Before you start using A.I. Solver Studio for problem solving it is highly recommended that look at the settings in Options -> Performance and make changes if required to reflect your computer configuration.
This is especially important for those with dual- or quad-core CPU’s as each extra CPU core drastically speeds up the training progress if the A.I. Solver Studio is configured properly.
Included with the program in the sub-directory “Samples” are a few sample training and testing data files as well as some unclassified data. It is strongly recommended that you view the contents of these files to gain an understanding of the data format used in A.I. Solver Studio. These files include comments which should answer a lot of questions. Creating projects in A.I. Solver Studio using these data files is also very helpful to get to know how the program works and is highly recommended as well.
Formulating a problem
As previously stated, A.I. Solver Studio deals with solving classification problems. If you are unsure of what classification problems are, the text below should help.
Classification problems are problems where you want to determine some result where all possible results are known in advance. However, nothing forbids one of the results to be “unknown” and this method can be used as a common set for all results that are not known in advance. To simplify things, we will assume all results or outcomes in your problem are known. Our goal in using A.I. Solver Studio is to create a solution that is capable of classifying items with acceptable accuracy based on some data, which we present as any number of numerical values.
Let’s take an example and look at one of the more interesting ways to use A.I. Solver Studio, stock market prediction:
We want to create a solution that can tell us at the end of each business day if the stock price of ACME Inc. will go down, go up, or hold tomorrow.
So up, down and hold are our possible results. Whatever happens tomorrow will always fit with one and only one of these results. Next we have to decide what data we want to base our prediction on. For simplicity we will pick just three widely known and understood values: Opening price today, closing price today and volume today. Dozens of web sites offer historical data of this kind for free so we can easily move forward and start creating our data files.
Creating Data Files
To create our solution we will need two data files: one for training, one for testing. The training data file is where A.I. Solver Studio will search for useful patterns to use for the classification. The testing data file is what we’ll use to know how well our solution will perform in the real-world. Testing data has no effect on training, the solution never “sees” this data and thus it might as well be data from the future as far as our solution is concerned. Why is testing necessary? Well, it’s not, in the sense that A.I. Solver Studio will certainly allow you to train a solution without testing data. However, what sometimes happens in this kind of problem solving scenarios is that a solution will fail to learn general patterns from the training data and read too much into patterns that are local to that data and do not occur outside it in the real world. The testing data will allow us to see immediately if this is going on as this is usually characterized by excellent training progress and poor performance on the testing data.
A.I. Solver Studio can read virtually any delimited text format. By looking at the sample files you can get a very clear idea of what your data files should look like. Training and testing data files look identical in structure, but the testing data has to contain items that are not part of the training data, and vice versa.
I recommend using about 10-20 percent of the total data you have available for testing data and the rest for training. It does not matter if you take the testing data from the front or the back of your data collection. You could even extract your test data at random locations from the training data (except when using the HindSight feature, more on that later). What is important is that each item only occur either in the training data or in the testing data, not both. Spreadsheet programs such as Microsoft Excel can be extremely helpful in creating data files and can save your data directly to a format that A.I. Solver Studio can read (.CSV for example).
To describe the format very briefly:
- All files must have a header row, with headers delimited just like regular data.
- After the header, any number of data lines can occur.
- Each data line has one column for the result for that data item, this column is furthest to the left.
- This is follow by one or more columns, each containing an input value we want to base our classification on.
- The file should end immediately after the last data row.
A slightly different format is used for unclassified data, more on this later.
Creating a project
Once your data files are in place you are ready to create your A.I. Solver Studio Project. This section will take you through the process of creating your project step. If you find this to be more complicated than expected, rest assure this is by far the most demanding operation in the program that a user can perform. There are also fairly detailed instructions in the program for each stop.
If you don’t feel like reading this section right now, you can consider these settings defaults and use them as a short cut:
Step #1 – Neural network project
Step #2 – [Specify your training data file]
Step #3 – All possible mistakes are equally wrong or serious
Step #4 – Leave HindSight disabled. Set solution complexity to Trivial or Simple.
Step #5 – Leave Over-Fitting Prevention disabled.
Step #1 – What kind of project
In this step you select the type of project you want to create. The type of the project refers to what kind of method you want to use to solve the problem. The available types are:
- Neural network project
- Genetic programming project
- Combo project (uses both neural networks and genetic programming)
I recommend starting with a neural network project as this project type requires the shortest amount of time to train. You can later expiriment with the other types to find the best one for your problem. We will not go into any details on how these different techniques work under the hood as you will never have to deal with any details relating to them in this program.
Step #2 – Load your training data file
In this step you simply specify the file containing your training data. If you are using a different delimiter in your file than the default (which is semi-colon) you need to specify this as well.
If your training data file is loaded successfully you can continue to the next step. Otherwise, you will be notified that there was a problem and information will be provided to identify the problem.
Step #3 – Error severity
In this step you will be presented with two options:
- All possible mistakes are equally wrong or serious
- Some mistakes are more wrong or serious than others
Your answer here should depend on your problem. For beginners, I recommend select the first option. The second option is for problems where some mistakes in classification have more serious consequences than others. Consider a problem of classifying what will happen to a particular company on the stock market tomorrow when there are three possible results: up, down and hold. It‘s easy to see that classifying the result UP as DOWN can be more serious than say, classifying SELL as HOLD. This is assuming that an UP classification would prompt you to buy into that stock and instead of going up it goes down, with financial consequences to you. Hopefully this example clarifies this feature.
If you select the latter option (Some mistakes are more wrong…) you will be taken to the second part of this step where you can specify the severity for each type of mistake. Mistakes can have the following severities: Don‘t care, Minor, Major and Critical. The selections you make here will guide the training process so that mistakes with high severity are less likely to occur on real-world data.
Step #4 – HindSight and Complexity
This step has two parts. A.I. Solver Studio has a special feature called HindSight which can be used for data with chronological meaning, such as time-series or any events occurring in time. To use this feature, your training data and test data must be sorted by order of ascending time, i.e. oldest first. This feature essentially allows your solution to look at past events to help make better classifications. How many events back it will go depends on your selected HindSight value, the default is 1. Before making large changes to this value, be aware that it will adversly affect the training time of your project (how long training takes).
The second part of this step is select the complexity of your problem. A.I. Solver Studio does not expect you to know this in advance, these are settings you should experiment with since each problem is unique and no general guidelines are available. I recommend starting with Trivial and then moving up as needed. Be aware that higher complexity settings will increase the training time, not surprisingly.
Step #5 – Over-Fitting Prevention
Over-Fitting Prevention is another feature of this program which can be used to eliminate a problems sometimes associated with the learning techniques used by A.I. Solver Studio. This problem is essentially that a solution will sometimes pay too much attention to patterns that are local to the training data and are not present in real world data. A clear sign of this happening in a project is when excellent training results are obtained while performance with testing data is poor. I recommend you only use this option if you need it.
If you need to enable this feature, you can specify the percentage of your data you want to use for Over-Fitting Prevention. The best value here depends on the size of your training data (number of data items), I would recommend about 15-20 percent for training data with more than 1500 data items and less otherwise.
This is the final step in creating a project. You are now ready to start training.
Loading test data
Once you have created your project, it is a good idea to load your testing data right away. This can be done by selecting Data -> Load testing data. Note that the structure of your training data and testing data should be exactly the same and the same delimiter must be used in both files.
If A.I. Solver Studio fails to load your testing data, it will show you an error message with information to help identify the problem.
Once loaded successfully, you can start training and watch the test data results update in real-time.
I recommend you always use test data when creating a solution for real world scenarios. Feel free to skip the testing data if you just want to examine if your training data contains useful patterns, but be aware that those patterns if present might not exist outside your training data.
The training process
Training your solution is a simple matter of pressing the Start Training button. Training is done in iterations. The type of project you have created, the size of your training data and the complexity setting you chose for the project all impact the time it takes for each iteration to finish. After training has been started you can leave the program running unattended, it will continue until it is manually stopped or until it has fully solved your classification problem – which ever comes first.
The quality of the solution can be easily seen at any time in the user interface. If you have loaded testing data, information on how your solution performs on testing data will be updated after each iteration (unless you have disabled this option).
There are four areas of the program window which are most interesting during training.
- Results for current solution in the Training section shows you how well the solution has learned to identify the training data.
- The Visual Progress Indicator shows you how fast your solution is improving. This is updated after each training iteration. An irregular line sloping up to the right is an indication that the solution is improving. If no improvements have been made for a while the message FLAT LINING will appear.
- Results from running solution on test data shows you how well the solution is performing on test data.
- Statistics on mistakes made with test data show you what kind of mistakes (if any) are being made on the test data in each iteration and how common each type of mistake is.
Then what?
One feature of A.I. Solver Studio that is potentially very useful has not been mentioned yet and you will find it under Data -> Process unclassified data.
This allows you to run the solution you have created on unclassified data, i.e. data for which the result is not known or at least not included in the file (as is the case with training and testing data). Essentially this feature allows you to put your solution to work for you in real world scenarios.
The structure of a data file containing unclassified data is identical to the structure of training and testing data files with the exception that the result column is omitted and can optionally be replaced with label columns. The label column is used for labeling each data item so you can clearly identify from A.I. Solver Studio what result belongs to what data item when you process unclassified data. The included sample files contain these kind of files, I recommend studying them if this is unclear to you.
Processing unclassified data manually through A.I. Solver Studio has it‘s limitations and can be labour intensive if done regularly. Feel free to contact us if you want to examine integrating your solution with any of your business software or processes, I can provide several methods for this including hosted web services, local web services and DLL libraries.
If you haven‘t been able to reach success with your solutions or problems, there are two factors to consider.
In order for A.I. Solver Studio to create a good solution, the your problem must be at least partly solvable. It can and often is impossible to know in advance which is the case, but the program can provide you with profound clues as to the answer.
You must also consider that A.I. Solver Studio is always trying to solve your problem based on information you decided to feed it with. Perhaps the most important factor in creating a good solution for complex problems is picking the right input data.
Settings
In this version of A.I. Solver Studio there are very few settings.
The most important ones have to do with performance and can be accessed via Options -> Performance. If you have a multi-core CPU you need to adjust these settings to reflect that to take full advantage of all cores. You can also adjust the priority of the training threads, although this is a pretty technical issue. If you don‘t mind your computer being less responsive while running A.I. Solver Studio you should set the thread priority to High. If you want your computer to be as responsive as possible while running A.I. Solver Studio, set the thread priority to low. Otherwise, leave it at it‘s default setting, which is medium.
Finally, be aware that if you change the delimiter from the default when creating your project the delimiter you used will become your default delimiter. I would definatly recommend sticking with the same delimiter for all problems and data for a multitude of reasons.
Add A Comment
You must be logged in to post a comment.