I was able to answer 1 through 4. I just need to answer 5 through…

Question I was able to answer 1 through 4. I just need to answer 5 through… I was able to answer 1 through 4. I just need to answer 5 through 7. Please help meOn rstudio First, we load the data and requisite packages. “`{r, message=FALSE}library(tidyverse)library(randomForest)## Loading the datamnist_raw <- read_csv("https://github.com/cerndb/dist-keras/blob/master/examples/data/mnist.csv?raw=true",                      col_names = TRUE)``` #1We will only be interested in classifying the digits 0 and 1. **Filter thedataset down to only the labels 0 and 1 and store this dataset as `mnist_01`.**```{r}# mnist_01 <- mnist_raw %>% mutate(label = ifelse(label>0,1,0))“`#2The following code takes a vector of length 784 and converts it to an imageusing the `geom_raster` function:“`{r}draw_image <- function(digit) { x <- rep(1:28, times = 28) y <- rep(1:28, each = 28) df <- data.frame(x = x, y = y, z = digit) ggplot(df, aes(x = x, y = y, fill = z)) + geom_raster() +    scale_fill_viridis_c()}``` The following code will grab pixels intensities and the labels:```{r}intensities <- mnist_01 %>% select(-label) %>% as.matrix()labels <- mnist_01 %>% pull(label)“`**Use the `draw_image()` to make images for the first 5 rows of the data usingthe pixel intensities. Can you tell the difference betweeen the 0s and 1s?**“`{r}draw_image <- function(digit) {  x <- rep(1:28, times = 28)  y <- rep(1:28, each = 28)  df <- data.frame(x = x, y = y, z = digit)  ggplot(df, aes(x = x, y = y, fill = z)) + geom_raster() +     scale_fill_viridis_c()}intensities <- mnist_01 %>% select(-label) %>% as.matrix()labels <- mnist_01 %>% pull(label)draw_image(intensities[5,1])“` #3 Let’s now compute the principal components of `intensities` using `prcomp`.“`{r}# pca_mnist <- prcomp(intensities)# pcs <- data.frame(pca_mnist$x) %>% mutate(label = labels)“`Then plot the first principal component against the second principal componentand color the points according to `label`. **Do either of the two principalcomponents do well to distinguish the 0s and 1s? Explain.**“`{r}ggplot(data=pcs,aes(x=PC1,y=PC2))+  geom_point(aes(color=label))“` #4 **Plot the proportion of variance explained by the first `d` principalcomponents against `d` for `d = 1,…,784`. How many principal components areneeded to account for 90% of the variability in the data?**“`{r}pca_minst_std_dev = pca_mnist$sdevpca_var = pca_minst_std_dev^2pca_var_ex = cumsum(pca_var/sum(pca_var))  %cumsum  varianceplot(pca_var_ex[0:100], xlab = “Principal Components”,ylab = “Proportion of Variance for Each Component”)abline(h = 0.9, col=”blue”, lty=5)  #Mark line for 90% variance“` #5 One interesting application of principal component analysis is to perform_image compression_ – if most of the data can be reduced to (say) 50 principalcomponents then we no longer need to store the whole 28-by-28 image. Thefollowing code, it turns out, will do the compression for us:“`{r}mean_intensities <- colMeans(intensities)compress_image <- function(row, num_components) { G <- pca_mnist$rotation[,1:num_components] compressed <- G %*% t(G) %*% (intensities[row,] - mean_intensities) + mean_intensities return(compressed)}```The function `row` tells us which image we want to compress while `num_components`tells us how many numbers we should use to describe the image (for example, if`num_components = 1` then we are reducing the data all the way down to a singlenumber, whereas if `num_components = 28^2` then we are not reducing the dataat all). **Play with `compress_image`, feeding the results into `draw_image` to visualize;for concreteness, print 5 of the images you make. How many principal componentsdo you need to do good job compressing? No hard answer here, just give yourpersonal opinion about how many you think you need to do in order to do well**```{r}## Your code here``` # 6The neat thing about what we see in the picture in Question 3 is that the principal components were able to group the 0s and 1s together _without evenknowing the labels!_ This suggests that we could use the principal componentsto predict whether a digit is 0 or 1.The following code divides the data into training and testing sets for you to use:```{r}set.seed(1239812)idx_train <- sample(1:nrow(mnist_01), size = floor(nrow(mnist_01) * .7))mnist_pca_train <- pcs[idx_train,]mnist_pca_test  <- pcs[-idx_train,]```So, for example, to evaluate the model for the first two principal components, we could do the following:```{r}glm_2 <- glm(label ~ ., family = binomial,             data = mnist_pca_train %>% select(label, PC1:PC3))predictions_2 <- ifelse(predict(glm_2, mnist_pca_test, type = 'response') > 0.5,                     1, 0)mean(predictions_2 == mnist_pca_test$label)“`**Repeat this analysis, varying the number of PCs from 3 to 100, and record thetest-set accuracy of each number of PCs. Which number of PCs results in the bestaccuracy on the test set? Plot the accuracy against the number of PCs as well.**“`{r, warning = FALSE}## Your code here“` # 7**Instead of using logistic regression, use a random forest with 50 principalcomponents. How does the result compare to logistic regression?** Use thesame train/test split.“`{r}## Your code here“`  Engineering & Technology Computer Science SDS 321 Share QuestionEmailCopy link Comments (0)