Data Analysis & Visualization Criteria Points Part 1 – Question 1 -…
Question Answered step-by-step Data Analysis & Visualization Criteria Points Part 1 – Question 1 -… Data Analysis & VisualizationCriteria PointsPart 1 – Question 1- Check the summary statistics of the data (use the describe function) – observations2Part 1 – Question 2- Create histograms to check the distribution of all the variables (use .hist() attribute) – Create boxplots to visualize the outliers for all the variables (use sns.boxplot()) – observations6Part 1 – Question 3- Apply the PCA algorithm with number of components equal to the total number of columns in the data – observations on the variance explained by the principal components4Part 1 – Question 4- Interpret the coefficients of the first three principal components from the below dataframe4Part 1 – Question 5- scatter plot for the first two principal components with hue = ‘cylinders’ – observations on the plot4Part 1 – Question 6- Apply the TSNE embedding with 2 components for the DataFrame ‘data_scaled’ (use random_state=1) – observations5Part 1 – Question 7- Complete the following code by filling the blanks – observations on different groups w.r.t different variables5Part 2 – Question 1- Identify and drop the rows with duplicate customer keys2Part 2 – Question 2- observations on the summary statistics of the data1Data Analysis & VisualizationCriteria PointsPart 1 – Question 1- Check the summary statistics of the data (use the describe function) – observations2Part 1 – Question 2- Create histograms to check the distribution of all the variables (use .hist() attribute) – Create boxplots to visualize the outliers for all the variables (use sns.boxplot()) – observations6Part 1 – Question 3- Apply the PCA algorithm with number of components equal to the total number of columns in the data – observations on the variance explained by the principal components4Part 1 – Question 4- Interpret the coefficients of the first three principal components from the below dataframe4Part 1 – Question 5- scatter plot for the first two principal components with hue = ‘cylinders’ – observations on the plot4Part 1 – Question 6- Apply the TSNE embedding with 2 components for the DataFrame ‘data_scaled’ (use random_state=1) – observations5Part 1 – Question 7- Complete the following code by filling the blanks – observations on different groups w.r.t different variables5Part 2 – Question 1- Identify and drop the rows with duplicate customer keys2Part 2 – Question 2- observations on the summary statistics of the data1Part 2 – Question 3- Check the distribution of all variables (use .hist() attribute) – Check outliers for all variables (use sns.boxplot()) – observations5Part 2 – Question 4- Interpret the above elbow plot and state the reason for choosing k=3 – Fit the K-means algorithms on the scaled data with the number of clusters equal to 3 – Store the predictions as ‘Labels’ to the ‘data_scaled_copy’ and ‘data’ DataFrames5Part 2 – Question 5- Create cluster profiles using the below summary statistics and box plots for each label.6Part 2 – Question 6- Apply the Gaussian Mixture Model algorithm on the scaled data with n_components=3 and random_state=1 – Create the cluster profiles using the below summary statistics and box plots for each label – Compare the clusters from both algorithms – K-means and Gaussian Mixture Model5Part 2 – Question 7- Apply the K-Medoids clustering algorithm on the scaled data with n_clusters=3 and random_state=1 – Create cluster profiles using the below summary statistics and box plots for each label – Compare the clusters from both algorithms – K-Means and K-Medoids6Part 2 – Question 3- Check the distribution of all variables (use .hist() attribute) – Check outliers for all variables (use sns.boxplot()) – observations5Part 2 – Question 4- Interpret the above elbow plot and state the reason for choosing k=3 – Fit the K-means algorithms on the scaled data with the number of clusters equal to 3 – Store the predictions as ‘Labels’ to the ‘data_scaled_copy’ and ‘data’ DataFrames5Part 2 – Question 5- Create cluster profiles using the below summary statistics and box plots for each label.6Part 2 – Question 6- Apply the Gaussian Mixture Model algorithm on the scaled data with n_components=3 and random_state=1 – Create the cluster profiles using the below summary statistics and box plots for each label – Compare the clusters from both algorithms – K-means and Gaussian Mixture Model5Part 2 – Question 7- Apply the K-Medoids clustering algorithm on the scaled data with n_clusters=3 and random_state=1 – Create cluster profiles using the below summary statistics and box plots for each label – Compare the clusters from both algorithms – K-Means and K-Medoids6it is okay I have uploaded everything Computer Science Engineering & Technology Networking Share QuestionEmailCopy link Comments (0)


