Can flower samples be assigned to their proper sub-family purely on the basis of quantitative observation?
Linear discriminant classification
high-quality, annotated dataset
easily the most cited dataset in the ML literature
Describing Fischer’s Iris dataset
Using python to analyse the Iris dataset
Histograms and scatter plots
Wiki image
Wiki image
Wiki image
Instance:
150 samples/datapoints, each having over 4 numerical dimensions \(\mathcal{D_1,} \dots \mathcal{D_{4}}\)
an expert classification function over 3 categories
Solution:
a linear combination \(\mathcal{D_1} \times \dots \mathcal{D_{4}}\rightarrow \mathcal{D_5}\)
that respects the given classification.
Measure: agreement with the given classification.
n=150 samples manually assigned by Fisher.
d=5 dimensions, four measurements and the classification
k=3 classes: Setosa, Versicolour and Virginica, 50 instances each, all available from
the iris.csv file from our class repo.
All measurements are in cm.
The`iris.csv
file is a comma separated file. To load the dataset into the memory we follow the following steps:
A graph consisting of rectangles whose area is proportional to the frequency of a feature and whose width is equal to the class interval denoted as bins
A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data.
A 2D visualization of datapoints against Cartesian coordinates.
Normally, the measured variable is on the x-axis.
if time is available then it is always on the x-axis.
Self-study this interesting package ucimlrepo
Open the exercise files, read the code then run it.
See the Matplotlib tutorial
See the Matplotlib tutorial