Glossary

Deep Blue

The process of using domain knowledge to create features that make machine learning algorithms work effectively.

Definition

Feature engineering is a crucial step in the process of developing and improving machine learning models. It involves transforming raw data into a format that is better suited for machine learning algorithms to process and learn from. This process can include creating new features from existing data, selecting the most relevant features, and transforming features to enhance the model's performance.

The aim is to highlight the important characteristics of the data that will help the algorithm to understand the underlying patterns and make accurate predictions or decisions. Feature engineering requires a combination of domain expertise, creativity, and analytical skills to identify the most informative attributes that contribute to the predictive power of a machine learning model.

Examples / Use Cases

In a machine learning model predicting house prices, raw data might include attributes like square footage, number of bedrooms, and zip code. Feature engineering might involve creating new features such as 'price per square foot', 'distance from city center', or categorizing zip codes into 'high-income', 'medium-income', and 'low-income' areas. These engineered features can provide more nuanced information that helps the model to capture variations in house prices more accurately.

For instance, 'price per square foot' could highlight the value differences in houses of similar sizes but different locations or conditions. Similarly, categorizing zip codes can help the model to learn the general economic conditions of different neighborhoods, which significantly affect house prices.