Data for Machine LearningAlberta Machine Intelligence Institute Do you agree that the skills of manipulating data is more important than building fancy models? I do believe so. Raw data is almost never ready to use, and they are only valuable only if being extracted and transformed properly. “Garbage in, garbage out”. Moreover live data expose…
Tag: Data for Machine Learning
Bad Data in Machine Learning
There are many ways that data can go wrong, sometimes through no fault of its own. Imbalanced Data A dataset with skewed class proportions where the vast majority of your examples come from one class is called an imbalanced dataset. Not surprisingly, having imbalanced classes in your learning data impacts the model that results. You…
Building Good Features for Machine Learning
Having a deep understanding of data is an essential prerequisite for doing EDA (Exploratory Data Analysis) as well as feature engineering. Very often, only certain types of feature engineering techniques are valid for certain types of data. Why do we need feature engineering? In order to visualize it better and to use it as features…
Prepare Your Data for Machine Learning Success
Data never actually arrives in the exact perfect form you want it to, so you need data pipeline process to prepare data. There are 3 typical stages in data pipeline: data extraction, data transformation, and data loading, collectively known as ETL. What tools and processes you use in each of these stages and overall, is…
Understanding Your Machine Learning Problems and Data
Recall in a previous course, we have introduced the Machine Learning Process Lifecycle (MLPL), simply put there are four stages: Business Understanding and Problem Discovery Data Acquisition and Understanding ML Modeling and Evaluation Delivery and Acceptance These phases are iterative, you can’t skip ahead. A good, clear problem definition is important. Business Understanding and Problem…