Erez Katz, CEO and Co-founder of Lucena Research
How a Genetic Algorithm (GA) Can Benefit Feature Selection
Our goal at Lucena is to democratize some of the best kept secrets in the Financial industry and refute the “black-box” image often associated with Machine Learning. In that spirit, I wanted to share with you an important process by which features are designated as most relevant to a particular asset universe and an investment strategy.
Feature Selection for Investment Strategies
As we’ve incorporated multiple big data sources into our investment research platform QuantDesk, our master feature database has ballooned to almost 1,000 indicators. Indicators are data elements that describe a security at a point in time. Examples of indicators can be found in the chart below.
A Factor, also called a Feature, is a quantitative attribute that describes a security at a given point in time.
With such a rich array of data points, we often struggle with deciding which indicators / models are most relevant at any particular time. Not all indicators are created equal, nor are they designed to be predictive at all times.
An effective Machine Learning algorithm “knows” how to adjust dynamically to environmental or idiosyncratic changes. A Genetic Algorithm (GA) is a technique that can be used to employ a scientific process of feature selection to help distinguish between predictive signals and noise.
What is a Genetic Algorithm In the Context of Big Data and Machine Learning?
Genetic algorithm (GA) is a problem solving method that mimics the process of natural selection. When utilizing Machine Learning for investment decisions, factors that are most relevant to your needs can be filtered from a wide list of indicators by replicating the process of natural evolution. The only difference is that rather than dealing with DNA and chromosomes we are dealing with indicators and multi-factor models.
Survival of the Fittest: How to Form the Best Feature Selection
The goal is to identify “nuggets.” A nugget is a multi-factor model composed of multiple indicators and their respective min/max values that together form a filter geared to identify the securities most prone to move predictably in the future. Here is an example of a multi-factor model.
We can easily conduct a fitness function (an event study, for example) to assess how predictive these conditions were historically. For example: let’s travel back in time (let’s say 1/1/2011 to 12/31/2011) and assess the average price move 20 days after certain stocks met the following condition:
- Gross margins are between 45% and 85%
- PE ratio is between 15 and 25%
- Beta is between 0.75 and 1.5
Using the Event Analyzer, the event date represents the date in which certain securities satisfied the multi-factor (nugget) criteria. The cone represents the standard deviation of the price action of the universe of the matching stocks after the “event” took place.
The bold line is the price prediction based on the mean. A fitness function would normally assess a more defined (biased) mean line combined with a narrower cone (smaller variance as defined by the standard deviation).
Now let's dive into the GA process.
What does the Genetic Algorithm process do? Two things:
- Identifies which indicators to combine into a nugget.
Here is the Genetic Algorithm process step-by-step:
Step 1: Generate random population. (Indicators are represented by letters.)
Step 2: Evaluate each nugget based on a fitness function.
Step 3: Sort the nuggets based on their fitness score.
Step 4: The best two nuggets survive to participate in the next evolution.
Step 5: Form the next generation of nuggets by selecting nuggets randomly. This time, however, we favor the indicators that scored higher in the previous evolution’s fitness evaluation.
Repeat the process above (steps 1 through 6) until you witness that a single nugget consistently remains in first place. You can now identify the “lone survivor” ready for further analysis and refinement before moving into production.
Why test AI for forming investment strategies?
The above process was greatly simplified for illustration, but you can see how vast the opportunities are to apply GA’s when forming investment strategies. You can read more about How Dynamic Models Prolong Your Investment Strategy.
The GA process covers an important step in machine learning research, which is Feature Selection. The process of selecting features most suitable for a strategy is a dynamic classification that knows how to adjust to change in market conditions which is highly relevant for our current market regime.
Interested in learning more about our AI driven investment strategies?