Unsupervised Learning
Outline
- Discrete latent factor
- Hidden markov models (example in genetics)
- Variational EM (example of Stochastic block model)
- Continuous Latent Variable
- Independant Component Analysis
Reference document
The lecture closely follows and largely borrows material from “Machine Learning: A Probabilistic Perspective” (MLAPP) from Kevin P. Murphy, chapters:
- Chapter 17: Markov and hidden Markov models
- Chapter 21: Variational inference
- Chapter 12: Latent Linear Models
- Chapter 13: Sparse Linear Models
Lectures Notes
Exercices
Unigrams and bigrams
Using the song of Leonard Cohen ‘Suzanne’ compute the unigrams and bigrams considering the letters of the alphabet as the states of the chain.
suzanne_eng <- readLines("suzanne-cohen-eng.txt")
unigrams <- suzanne_eng %>% paste(collapse=" ") %>%
# tokenize by character (strsplit returns a list, so unlist it)
strsplit(split="") %>% unlist %>%
# remove instances of characters you don't care about
str_remove_all("[,.!'\"]") %>%
str_to_lower() %>%
# make a frequency table of the characters
table
letters_space<-c(letters," ")
x<-rep(0,27);names(x)<-letters_space
for (letter in letters_space) x[letter]<-unigrams[letter]
unigrams<-x
barplot(log(unigrams/sum(unigrams,na.rm=TRUE)+1))
suzanne1<-c(" ",suzanne_eng) %>% paste(collapse="") %>%
# tokenize by character (strsplit returns a list, so unlist it)
strsplit(split="") %>% unlist %>%
# remove instances of characters you don't care about
str_remove_all("[,.!'\"]") %>%
str_to_lower()
suzanne2<-c(suzanne_eng," ") %>% paste(collapse="") %>%
# tokenize by character (strsplit returns a list, so unlist it)
strsplit(split="") %>% unlist %>%
# remove instances of characters you don't care about
str_remove_all("[,.!'\"]") %>%
str_to_lower()
bigrams<-table(suzanne2,suzanne1)
X<-matrix(0,27,27)
row.names(X)<-letters_space
colnames(X)<-letters_space
for (letteri in letters_space)
for (letterj in letters_space)
if ((letteri %in% row.names(bigrams))&&(letterj %in% row.names(bigrams))) X[letteri,letterj]<-bigrams[letteri,letterj]
bigrams<-X
Simulation of HMM
A<-matrix(c(0.3,0.7,0,0,0.9,0.1,0.6,0,0.4),3,3,byrow = TRUE)
B<-matrix(c(0.5,0.2,0.3,0,0,0,0,0,0,0.2,0.7,0.1,0,0,0,0,0,0.1,0,0.5,0.4),7,3)
X<-c(1,3,4,6)
M<-diag(rep(1,3))-A
M[,3]<-rep(1,3)
Pi<-solve(t(M),b=c(0,0,1))
SimulationHMM<-function(Pi,A,B,n){
Z<-rep(0,n) # hidden states
X<-rep(0,n) # emission (obs.)
K<-length(Pi) # nb of hidden states
N<-nrow(B) # nb of modalities
Z[1]<-sample(1:K,prob = Pi,size = 1,replace=TRUE)
X[1]<-sample(1:N,prob=B[,Z[1]],size=1)
for (i in 2:n){
Z[i]<- sample(1:K,prob=A[Z[i-1],],size=1)
X[i]<- sample(1:N,prob=B[,Z[i]],size=1)
}
return(list(X=X,Z=Z))
}
Hmmsimu<-SimulationHMM(Pi,A,B,100)
plot(Hmmsimu$X,col=Hmmsimu$Z)
SimulationHMMgauss<-function(Pi,A,n){
Z<-rep(0,n)
X<-rep(0,n)
K<-length(Pi)
N<-nrow(B)
Z[1]<-sample(1:K,prob = Pi,size = 1,replace=TRUE)
X[1]<-rnorm(1,mean=2*Z[1])
for (i in 2:n){
Z[i]<- sample(1:K,prob=A[Z[i-1],],size=1)
X[i]<-rnorm(1,mean=2*Z[i])
}
return(list(X=X,Z=Z))
}
Hmmsimu<-SimulationHMMgauss(Pi,A,100)
plot(Hmmsimu$X,col=Hmmsimu$Z)
Projet
Document and Links
Reference books about machine learning
- Machine Learning: A Probabilistic Perspective from Kevin P. Murphy
- Pattern Recognition and Machine Learning from Chris M Bishop
R base
Official manuals about R base can be retrieved from
https://cran.r-project.org/manuals.html
Contribution by the community can be retrieved from
https://cran.r-project.org/other-docs.html
The short introduction from Emmanuel Paradis allows a quick start
- ‘‘R for Beginners’' by Emmanuel Paradis.
Longer book allow a deepening. See for example
- ‘‘Using R for Data Analysis and Graphics - Introduction, Examples and Commentary’' by John Maindonald.
R from RStudio developers
- R for Data Science The book of Wickam about more recent R development for data science
And if you want more see https://www.rstudio.com/resources/books/