Can flower samples be assigned to their proper sub-family purely on the basis of quantitative observation?
Linear discriminant classification
high-quality, annotated dataset
easily the most cited dataset in the ML literature
Describing Fischer’s Iris dataset
Using python to analyse the Iris dataset
Histograms and scatter plots
Instance:
150 samples/datapoints, each having over 4 numerical dimensions \(\mathcal{D_1,} \dots \mathcal{D_{4}}\)
an expert classification function over 3 categories
Solution:
a linear combination \(\mathcal{D_1} \times \dots \mathcal{D_{4}}\rightarrow \mathcal{D_5}\)
that respects the given classification.
Measure: agreement with the given classification.
n=150 samples manually assigned by Fisher.
d=5 dimensions, four measurements and the classification
k=3 classes: Setosa, Versicolour and Virginica, 50 instances each, all available from
the iris.csv file from our class repo.
All measurements are in cm.
The`iris.csv
file is a comma separated file. To load the dataset into the memory we follow the following steps:
A graph consisting of rectangles whose area is proportional to the frequency of a feature and whose width is equal to the class interval denoted as bins
A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data.
A 2D visualization of datapoints against Cartesian coordinates.
Normally, the measured variable is on the x-axis.
if time is available then it is always on the x-axis.
Self-study this interesting package ucimlrepo
Open the exercise files, read the code then run it.
See the Matplotlib tutorial
See the Matplotlib tutorial