Skip to content

KZHU.ai 馃殌

Into the Unknown

Menu
  • 馃搱 Discrete Optimization
    • Mathematics
    • Computer Science
    • Cryptography
    • C++
  • 鈿涳笍 Quantum Computing
    • Physics
    • Blockchain
  • 馃 Machine Learning
    • TensorFlow
    • Python
  • 馃洶 Data Science
    • Statistics
    • Matlab
  • 馃對 Hybrid Cloud
    • Kubernetes
    • Golang
    • Web
  • 馃搱 Business
    • MBA @ Gies
    • Six Sigma
  • 馃彟 Finance
    • Law
    • Economics
Menu

MATLAB: Exploratory Data Analysis

Posted on June 7, 2021October 19, 2022 by keslerzhu

Table of Contents

Toggle
  • Data Science Workflow
  • Importing Data
  • Visualizing Data
  • Computations
  • Categorical data
  • Live Scripts
  • My Certificate

Data Science Workflow

An important first step is to ask good questions about the data. It helps tune data into actionable information. First question can be “What happened?” And then the second might be “Why did this happen?” At this point, you can go even further and ask “What would happen next?” Finally you may ask “What should be done about it?”

Analyzing data does not scale very well. You don’t need to be come a seasoned programming either to become a data scientist. MATLAB is both an environment interacting with data and a programming language. You will use live script to analyze data.

A typical data science project comprises 3 stages: Data Analysis, Machine Learning and Results. The goal of Data Analysis is to learn more about your data before trying to learn from it. Machine Learning is the process of using algorithm to model the relationship between variables and observations. Finally you start working with Results.



Importing Data

Once you have identified the data source, next step is access and import data into MATLAB. Preparing data for analysis can be a major challenge in Data Science. In MATLAB data is organized into rows and columns. A default behavior is to assume each variable a column vector with each observation in its own row. When necessary the third dimension sheets are used. There are a few commonly used data types, e.g. double, string, categorical, datetime. A table itself is a datatype called ‘table’. You may generate code to import data automatically. It helps save time and share with others.

Visualizing Data

MATLAB offers tools to visualize, select, and modify data. You can capture the code generated by MATLAB and add it to a Live Script. Creating visualization is a great way to gain insight into what data contains. Capturing generated code is really important because you typically will try varieties of approaches, and retrace your steps.

Computations

In MATLAB, you can create a vector by entering sequence of values, placed in squared brackets, separating the values by commas. You can create uniformly spaced vector by using colon operator. Element-wise operators include .* ./ and .^. When adding or subtracting scalar, the scalar is automatically expanded to match the size of the vector before performing addition and subtraction.

Descriptive statistics provide a convenient way to summarize data sets that may contain millions of values. You can use summary function to take a quick overview of each variable. mode function returns the most frequent values in array. The omitnan parameters used in mean, median functions can help remove NaN values.

Pearson correlation coefficient is used to describe the relationship between 2 variables. Use the corn function in MATLAB. The magnitude of the correlation coefficient is NOT related to the slope of a linear relationship. Only two close the data points are to falling on a line. Further more, a small correlation coefficient is only indicative of a weak linear relationship. A strong non-linear relationship may still result in a coefficient close to zero. When you need to select elements in a vector or matrix, use conditions.

Categorical data

A categorical variable always have a finite set of discrete categories. To reorder categories, use reordercats function with a vector of strings that contains the name of categories. The function removecats removes unnecessary categories. Use functions addcats, renamecats, mergecats to merge existing categories into a new category. Calling groupsummary function (with table name and grouping variable) returns a table containing different categories and number of element in each category. If you want to calculate other statistic values, you need to specify the method and variables to apply the computations.

When visualizing data, use hold on and hold off to fit multiple plots in the same figure.

Live Scripts

Documentation is important in any data science project. First it allows you to reuse your work. Second it helps others understand your work. Third it helps represent the result. More importantly, presenting to non-technical is an essential skill for any data science. Adding control easily identify the variables to modify and makes analysis quicker and easier. It also helps assist with selection of the appropriate values.



My Certificate

For more on exploratory data analysis with MATLAB, please refer to the wonderful course here https://www.coursera.org/learn/exploratory-data-analysis-matlab

My #58 course certificate from Coursera

I am Kesler Zhu, thank you for visiting. Check out all of my course reviews at https://KZHU.ai

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

American Contract Law I Andrew Ng Anna Koop Brenda Gunderson Christopher Millard Computer Communications Specialization Cryptography Economics of Money and Banking Evgenii Vashukevich Garud Iyengar Ivan Vybornyi Jeffrey Chasnov John Daily Jonathan Katz Kevin Webster Ling-Chieh Kung Machine Learning: Algorithms in the Real World Martin Haugh Mathematics for Engineers Specialization Matthew Hutchens Michael Donohoe Michael Fricke Microsoft Azure Fundamentals AZ-900 Exam Prep Specialization Operations Research (3): Theory Perry Mehrling Petro Lisowsky Physical Basics of Quantum Computing Practical Reinforcement Learning Rebekah May Search Engine Optimization (SEO) Specialization Sergey Sysoev Statistical Thermodynamics Specialization Statistics with Python Specialization Taxation of Business Entities I: Corporations TensorFlow 2 for Deep Learning Specialization U.S. Federal Taxation Specialization Wounjhang Park Xiaobo Zhou Yi Wang 小褘褋芯械胁 小械褉谐械泄 小械褉谐械械胁懈褔

Subscribe to our newsletter!

© 2025 KZHU.ai 馃殌 | Powered by Superbs Personal Blog theme

Privacy Policy - Terms and Conditions