Unsupervised Learning
- Datapoints do not have any outcomes, or target is unknown.
- We are interested in the structure of the data or the patterns within the data.
-
Types:
-
Clustering:
Algorithm like:
- K-Means
- Hierarchical Agglomerative Clustering
- DBSCAN
- Mean shift
-
Dimensionality Reduction:
Algorithm like:
- PCA
- Non-negative matrix factorization
- They are important because of the curse of dimensionality .
- .. which means that as no of features increases performance gets worse, and cost or the number of training examples required increases.
-
Clustering:
Algorithm like:
-
Many use cases like:
- Classification
- Anomaly Detection
- Customer Segmentation
- Improve Supervised Learning
Dimensionality Reduction
-
One way is
Principal Component Analysis (PCA)
- Using Singular Value Decomposition (SVD)
- A feature vector, A, of dimension mxn can be decomposed into three metrics using SVD .
- which leads to A mxn = U mxm * S mxn * V transpose nxn
- S is a diagonal matrix, U and V are square metrices. Principal components are computed from V, i.e. multiple original matrix A with V.Transpose.
- If we want to reduce data from mxn to mxk , ie from n features to k features, we single choose U mxk , S kxk and V.T kxk . In this case we multiply, A mxn with V.Transpose nxk
- It is important to scale before PCA, because outliers or distant datasets can skew data.
from sklearn.decomposition import PCA PCAins = PCA(n_components=3) X_trans = PCAins.fit_transform(X_train) #X_trans is our new data
-
Kernel PCA
- For non-linear PCA.
- internally,kernel first maps the data into linear space and applies PCA
- kernel PCA tend to preseve the geometric distance between the points
-
from sklearn.decomposition import KernelPCA kpca = KernelPCA(n_components=3,Kernel='rbf',gamma=1.0 ) X_trans = kpca.fit_transform(X)
-
Multi-Dimensional Scaling (MDS)
- for non-linear transformation
- does not preserve the variance
- but maintains the geometric distances between points
-
from sklearn.decomposition import MDS mds = MDS(n_components=2) X_trans = mds.fit_transform(X)
Other popular manifold dimensionality reduction methods are Isomap, TSNE.
Non-Negative Matrix Factorization
- Same as PCA, but all the matrics must only have positive values
- for example, for document analysis or pixel values in images
- powerful for many problems related to documents, texts, images and videos
- this is powerful also because it can never undo the application of a latent feature since its only addtion
- more human interpretable
- since only positive values are considered, it can loose more information when truncating
- Unlike PCA, it does not give orthogonal latent vectors.
-
Example of document processing:
- Input count vectorizer or TF-IDF processed word document
- parameters to tune: no of topics, text proprocessing
-
from sklearn.decomposition import NMF nmf = NMF(n_components=3,init='random') Xx = nmf.fit(X)