# Unsupervised Learning

## Outline

1. Discrete latent factor
• Hidden markov models (example in genetics)
• Variational EM (example of Stochastic block model)
1. Continuous Latent Variable
• Independant Component Analysis

## Reference document

The lecture closely follows and largely borrows material from “Machine Learning: A Probabilistic Perspective” (MLAPP) from Kevin P. Murphy, chapters:

• Chapter 17: Markov and hidden Markov models
• Chapter 21: Variational inference
• Chapter 12: Latent Linear Models
• Chapter 13: Sparse Linear Models

## Exercices

### Unigrams and bigrams

Using the song of Leonard Cohen ‘Suzanne’ compute the unigrams and bigrams considering the letters of the alphabet as the states of the chain.

suzanne_eng <- readLines("suzanne-cohen-eng.txt")
unigrams <- suzanne_eng %>% paste(collapse=" ") %>%
# tokenize by character (strsplit returns a list, so unlist it)
strsplit(split="") %>% unlist %>%
# remove instances of characters you don't care about
str_remove_all("[,.!'\"]") %>%
str_to_lower() %>%
# make a frequency table of the characters
table
letters_space<-c(letters," ")
x<-rep(0,27);names(x)<-letters_space
for (letter in letters_space) x[letter]<-unigrams[letter]
unigrams<-x
barplot(log(unigrams/sum(unigrams,na.rm=TRUE)+1))

suzanne1<-c(" ",suzanne_eng) %>% paste(collapse="") %>%
# tokenize by character (strsplit returns a list, so unlist it)
strsplit(split="") %>% unlist %>%
# remove instances of characters you don't care about
str_remove_all("[,.!'\"]")  %>%
str_to_lower()

suzanne2<-c(suzanne_eng," ") %>% paste(collapse="") %>%
# tokenize by character (strsplit returns a list, so unlist it)
strsplit(split="") %>% unlist %>%
# remove instances of characters you don't care about
str_remove_all("[,.!'\"]")  %>%
str_to_lower()

bigrams<-table(suzanne2,suzanne1)

X<-matrix(0,27,27)
row.names(X)<-letters_space
colnames(X)<-letters_space
for (letteri in letters_space)
for (letterj in letters_space)
if ((letteri %in% row.names(bigrams))&&(letterj %in% row.names(bigrams))) X[letteri,letterj]<-bigrams[letteri,letterj]

bigrams<-X


### Simulation of HMM

A<-matrix(c(0.3,0.7,0,0,0.9,0.1,0.6,0,0.4),3,3,byrow = TRUE)
B<-matrix(c(0.5,0.2,0.3,0,0,0,0,0,0,0.2,0.7,0.1,0,0,0,0,0,0.1,0,0.5,0.4),7,3)
X<-c(1,3,4,6)
M<-diag(rep(1,3))-A
M[,3]<-rep(1,3)
Pi<-solve(t(M),b=c(0,0,1))

SimulationHMM<-function(Pi,A,B,n){
Z<-rep(0,n) # hidden states
X<-rep(0,n) # emission (obs.)
K<-length(Pi) # nb of hidden states
N<-nrow(B)    # nb of modalities
Z<-sample(1:K,prob = Pi,size = 1,replace=TRUE)
X<-sample(1:N,prob=B[,Z],size=1)
for (i in 2:n){
Z[i]<- sample(1:K,prob=A[Z[i-1],],size=1)
X[i]<- sample(1:N,prob=B[,Z[i]],size=1)
}
return(list(X=X,Z=Z))
}

Hmmsimu<-SimulationHMM(Pi,A,B,100)
plot(Hmmsimu$X,col=Hmmsimu$Z)

SimulationHMMgauss<-function(Pi,A,n){
Z<-rep(0,n)
X<-rep(0,n)
K<-length(Pi)
N<-nrow(B)
Z<-sample(1:K,prob = Pi,size = 1,replace=TRUE)
X<-rnorm(1,mean=2*Z)
for (i in 2:n){
Z[i]<- sample(1:K,prob=A[Z[i-1],],size=1)
X[i]<-rnorm(1,mean=2*Z[i])
}
return(list(X=X,Z=Z))
}

Hmmsimu<-SimulationHMMgauss(Pi,A,100)
plot(Hmmsimu$X,col=Hmmsimu$Z)


## Projet

### R base

https://cran.r-project.org/manuals.html