Unsupervised Learning

Datapoints do not have any outcomes, or target is unknown.
We are interested in the structure of the data or the patterns within the data.
Types:
- Clustering: Algorithm like:
  - K-Means
  - Hierarchical Agglomerative Clustering
  - DBSCAN
  - Mean shift
- Dimensionality Reduction: Algorithm like:
  - PCA
  - Non-negative matrix factorization
  - They are important because of the curse of dimensionality .
  - .. which means that as no of features increases performance gets worse, and cost or the number of training examples required increases.
Many use cases like:
- Classification
- Anomaly Detection
- Customer Segmentation
- Improve Supervised Learning

Dimensionality Reduction

Principal Component Analysis (PCA)

Using Singular Value Decomposition (SVD)
A feature vector, A, of dimension mxn can be decomposed into three metrics using SVD .
which leads to A mxn = U mxm * S mxn * V transpose nxn
S is a diagonal matrix, U and V are square metrices. Principal components are computed from V, i.e. multiple original matrix A with V.Transpose.
If we want to reduce data from mxn to mxk , ie from n features to k features, we single choose U mxk , S kxk and V.T kxk . In this case we multiply, A mxn with V.Transpose nxk
It is important to scale before PCA, because outliers or distant datasets can skew data.

from sklearn.decomposition import PCA PCAins = PCA(n_components=3) X_trans = PCAins.fit_transform(X_train) #X_trans is our new data

Kernel PCA

For non-linear PCA.
internally,kernel first maps the data into linear space and applies PCA
kernel PCA tend to preseve the geometric distance between the points
from sklearn.decomposition import KernelPCA kpca = KernelPCA(n_components=3,Kernel='rbf',gamma=1.0 ) X_trans = kpca.fit_transform(X)

Multi-Dimensional Scaling (MDS)

for non-linear transformation
does not preserve the variance
but maintains the geometric distances between points
from sklearn.decomposition import MDS mds = MDS(n_components=2) X_trans = mds.fit_transform(X)

Other popular manifold dimensionality reduction methods are Isomap, TSNE.

Non-Negative Matrix Factorization

Same as PCA, but all the matrics must only have positive values
for example, for document analysis or pixel values in images
powerful for many problems related to documents, texts, images and videos
this is powerful also because it can never undo the application of a latent feature since its only addtion
more human interpretable
since only positive values are considered, it can loose more information when truncating
Unlike PCA, it does not give orthogonal latent vectors.
Example of document processing:
- Input count vectorizer or TF-IDF processed word document
- parameters to tune: no of topics, text proprocessing
from sklearn.decomposition import NMF nmf = NMF(n_components=3,init='random') Xx = nmf.fit(X)