please answer this question Question: from urllib.request import…
Question Answered step-by-step please answer this question Question: from urllib.request import… please answer this question Question:from urllib.request import urlopen #importing the libraries for html and url operationsfrom urllib.parse import urljoinfrom html.parser import HTMLParserwebsite_data = []top_values_list = []# function to collect the tags linksclass Collector(HTMLParser): approved_tags = [“h1”, “h2″,”h3″,”h4″,”h5″,”h6”,’p’,’li’] def __init__(self, url): HTMLParser.__init__(self) self.url = url self.links = [] self.html_tag = None #function to check for the start tags and check for hyperlinks def handle_starttag(self, tag, attrs): self.html_tag = tag if tag == “a”: for attr in attrs: if attr[0] == “href”: absolute = urljoin(self.url, attr[1]) if absolute[:4] == “http”: self.links.append(absolute) def getLinks(self): return self.links def handle_data(self, data): global website_data if self.html_tag in Collector.approved_tags: website_data.append(data) def getData(self): global website_data return website_data visited = set()#function to get the words def content_cleanUp(self,content): new_content = [] for word in content: words = word.split() for w in words: new_content.append(w.lower()) return new_content #function to analyze the data collected def analyze(self,url): content = urlopen(url).read().decode() collector = Collector(url) collector.feed(content) url = collector.getLinks() content = collector.getData() clean_content = content_cleanUp(content) global top_values_list top_values_list.append(clean_content) return top_values_list #function to count the words in the list def frequency(self,top_values_list): counter = {} for fwords in top_values_list: for w in fwords: if w in counter: counter[w] += 1 else: counter[w] = 1 return counter#function to store the top 25 words def find_top_words(self,counterV): top_ones = [] top_words = [] top_values = [] for x, y in counterV.items(): top_ones.append(y) top_ones.sort(reverse = True) top25 = top_ones[0:25] if y in top25: top_words.append(x) top_values.append(y) return top_words, top_values# function to display the words def display(self,top_words, top_values): for i, word in enumerate(top_words): print(“n{:50} {:5}”.format(word, top_values[i])) #main function contains the main urldef main(): url=’https://www.cdm.depaul.edu/Pages/default.aspx’ collector=Collector(url) top_values_list=collector.analyze(‘https://www.cdm.depaul.edu/Pages/default.aspx’) counterV = collector.frequency(top_values_list) top_words, top_values = collector.find_top_words(counterV) collector.display(top_words, top_values) Python 3In this assignment, we will build a decision tree classifier for multi-class classification with continuous feature/attribute values. For the decision tree you are to implement, please always use binary split and a threshold to split data. That is, each decision node has the form: where the best and are for you to determine. Please use information gain to construct the decision tree.Model SpecificationsFor the decision tree, let max_depth = 2 be the only stopping criterion. In other words, you should grow the tree as long as max_depth = 2 is not violated and not all training instances in the current node have the same label.Tie BreakingWe ensure each attribute is named by a non-negative integer, and each class label is named by a positive integer. Since we are to use HackerRank for grading, we have to eliminate additional randomness and generate deterministic results. We, therefore, enforce the following rule in this assignment: In the event of ties, always choose the attribute or label with the smallest value. Namely, For all test cases in this assignment, we guarantee results are deterministic as long as the requirements above are satisfied. Additional Testcases for DebuggingHere are some additional testcases for you to debug your code.Input FormatEach input dataset contains training instances followed by test instances. Each line has the following space-separated format:[label] [attribute 1]:[value 1] [attribute 2]:[value 2]…The name of each attribute, e.g., [attribute 2], is a non-negative integer. The value of an attribute, e.g., [value 2], is a float number. A line stands for a test instance if [label] is -1 and a training instance otherwise. The label of a training instance can be any positive integer.Please do not assume the attribute names to start from 0 or to be consecutive integers, and please do not assume the class labels to start from 1 or to be consecutive integers.ConstraintsOnly standard libraries are allowed for this assignment. Advanced libraries (e.g., sklearn, pandas, numpy) are not allowed and would generate an error by HackerRank.Output FormatThe output is the prediction on the test instances made by your DT. In each line of the output, print the prediction for each test instance. Please follow the ordering of the test instances in the input file.Sample Input 0Sample Input 01 0:1.0 2:1.01 0:1.0 2:2.01 0:2.0 2:1.03 0:2.0 2:2.01 0:3.0 2:1.03 0:3.0 2:2.03 0:3.0 2:3.03 0:4.5 2:3.0-1 0:1.0 2:2.2-1 0:4.5 2:1.0Sample Output 011 Please use information gain. d show the control factors for JK flip flops. [10 marks] 7 [TURN OVER CST.96.11.8 13 Designing Interactive Applications When a new patient applies to join a doctor’s practice, personal and medical-history details must be obtained. Usually the patient (or the patient’s parent in the case of young children) must fill in a form of two pages or more for inclusion in the patient’s records. With the computerization of one particular doctor’s practice, P1, a means is needed for entering the new patient’s details. Two approaches are considered: (A) the doctor interviews the patient at the start of the initial consultation, and enters the details as they are elicited; (B) upon application, the patient or parent sits down at a computer and enters the details. Write one-sentence problem statements for each design problem. Then, drawing on your knowledge of the work of the doctor, discuss the pros and cons of the two approaches. [12 marks] Suppose two practices, P1 and P2, adopt approaches A and B respectively. Each is dissatisfied with the results. Practice P1 therefore decides to switch to approach B, installing a computer in a booth adjoining its waiting room, running the system designed for the doctor (modified only to prevent access to existing records), so that patients and parents can enter their details. Meanwhile practice P2 decides to change to approach A, loading the patient data entry program, unchanged, onto the doctor’s PC so that he or she can enter the details during consultations. If you were asked to advise practices P1 and P2 on these moves, what outcomes would you predict? What analytical method would you use, in each case, to back up your predictions, and why? Einstein has established that there is no universal time. For earth-based computer systems discuss how events might be assigned a time stamp which is reasonably close to conventional earth-time. Describe the constraints on system-wide event ordering and discuss alternative approaches to meeting them. [10 marks] For a system in which data replicas are maintained: Either Define total order and causal order ap order. [10 mas at all times. [10 marks] 2 CST.96.9.3 5 Business Studies What is meant by SWOT analysis? [5 marks] A small computer company with strong and innovative hardware expertise is considering manufacturing a network interface computer (NIC). The device, which would sell for about half the current price of a PC, is based on games console technology, with a built-in modem. It would allow a user to convert his or her television to a web-browser. Apart from a small amount of parameter storage, the proposed device contains no disc or other long-term memory. How would you determine the market for such a device? [5 marks] Perform an analysis of this opportunity. What advice would you give the company? [5 marks] Comment on changes to the business model that may be expected to be caused by the rapid development of the Internet. [5 marks] 6 Advanced Algorithms Explain the steps involved in using the Miller-Rabin test to check whether a number N is composite. This will involve computing a N−1 mod N for some value of a. [10 marks] Carry out the steps for N = 65 and a = 1, 2, 8 and 12. Comment on what (if anything) each partial result tells you about N and which cases (if any) help you to decide whether N is prime or what its factors might be. Pretend throughout the calculation that you do not know that 65 = 5×13. Proceed as though 65 were a huge number, imagining that you do not know at the outset whether it is prime or composite and that you are certainly unable to spot any factors. [10 marks] 3 [TURN OVER CST.96.9.4 7 Optimising Compilers Briefly summarize the main concepts of strictness analysis including the kind of languages to which it applies, and the way in which both system-provided and user-defined functions f yield strictness properties f # (relate the types of f and f #). [6 marks] Give the strictness functions corresponding to the following ternary functions: (a) f1(x,y,z) = x*y + z (b) f2(x,y,z) = if x=9 then y else z (c) f3(x,y,z) = pif x=9 then y else z where pif e1 then e2 else e3 is the parallel conditional: it behaves similarly to the standard conditional in that if e1 evaluates to true or false then it yields e2 or e3 as appropriate; however, evaluation of e2 and e3 occurs concurrently with e1 to allow the pif construct also to terminate with the value of e2 when e2 and e3 both terminate with equal values (even if e1 computes forever). Comment briefly how your strictness property for f1 would change if the multiplication returned zero without evaluating the other argument in the event that one argument were zero. [7 marks] Let g, h1 and h2 be binary functions and recall the definition of function composition: g ◦ hh1, h2i = λ(x, y).g(h1(x, y), h2(x, y)). Define three such functions in an ML-like syntax (whose arguments and results are integers) and which have the property that (g ◦ hh1, h2i) # 6= g # ◦ hh # 1 , h# 2 i. [Hint: you might find it helpful to think of a solution where g may ignore one of its arguments but always does when composed with hh1, h2i.] Comment whether this inequality means that g # ◦ hh # 1 , h# 2 i fails to be a safe strictness property for g ◦ hh1, h2i. [7 marks] 4 CST.96.9.5 8 Computational Neuroscience It has been remarked that “neural networks are the second best way of computing just about anything.” Discuss this, touching on the following issues: expressiveness; computational efficiency; generalization; sensitivity to noise; transparency (the ability to explain why a given output value is justified); the use of prior knowledge; whether neural networks fulfill our needs for a comprehensive computational theory of learning. [20 marks] 9 Security Shamir’s three-pass protocol enables Alice to send a message m to Bob in the following way: A → B : mka (mod p) B → A : mka kb (mod p) A → B : mkb (mod p) Explain this protocol, stating the constraint on m and the principal vulnerability. [10 marks] It is suggested that the encryption operation m → mkx be replaced with a provably secure encryption operation, namely a one-time pad. How would this affect the protocol’s security? [10 marks] 10 Natural Language Processing Describe three significant differences between programming languages and natural languages. [8 marks] What problems do these differences pose for attempts to construct programs that “understand” a natural language? [12 marks] 5 [TURN OVER CST.96.9.6 11 Information Theory and Coding Consider a noiseless analog communication channel whose bandwidth is 10,000 Hz. A signal of duration 1 second is received over such a channel. We wish to represent this continuous signal exactly, at all points in its one-second duration, using just a finite list of real numbers obtained by sampling the values of the signal at discrete, periodic points in time. What is the length of the shortest list of such discrete samples required in order to guarantee that we capture all of the information in the signal and can recover it exactly from this list of samples? [5 marks] Name, define algebraically, and sketch a plot of the function you would need to use in order to recover completely the continuous signal transmitted, using just such a finite list of discrete periodic samples of it. [5 marks] Consider a noisy analog communication channel of bandwidth Ω, which is perturbed by additive white Gaussian noise whose power spectral density is N0. Continuous signals are transmitted across such a channel, with average transmitted power P (defined by their expected variance). What is the channel capacity, in bits per second, of such a channel? What is meant by the term demand paging in a virtual memory management system, and how is it implemented? [5 marks] Briefly describe five techniques which the operating system and/or hardware can implement to improve the efficiency of demand paging. [5 marks] What is the working set of a program, and how can an operating system use it in the management of virtual memory? [3 marks] Describe the clock (second chance) algorithm for selecting a VM page for replacement when a page fault occurs. How is the performance of this algorithm affected by the memory size of the computer system, and how may this be avoided? [7 marks] 4 CST.96.11.5 8 Mathematics for Computation Theory Let E be an event over S that is accepted by the deterministic finite automaton M ≡ (Q, S, ι, f, A), where | Q | = N. Suppose that z ∈ E is a word such that `(z) > N: show that we may write z = uvw where (i) `(uv) 6 N (ii) `(v) > 1 (iii) for all n > 0, uvnw ∈ E [12 marks] State whether each of the following languages over S = {a, b} is regular, giving your reasons. (a) L1 = {ww | w ∈ S ∗ } [6 marks] (b) L2 = {wzw | w, z ∈ S ∗ } [2 marks] [Note: | Q | indicates the number of elements in set Q, and `(w) the number of characters in word w.] 9 Computation Theory A bag B of natural numbers is a total function fB : N → N giving for each natural number x the count fB(x) of occurrences of x in B. If each fB(x) = 0 or 1, then fB is the characteristic function χs of a set S: every set can thus be regarded as a bag. (a) A bag B is recursive if the function fB is computable. Suppose that the sequence of bags {Bn | n ∈ N} is recursively enumerated by the computable function e(n, x) = fn(x), which gives the count of x in each bag Bn. Show that there is a recursive set S that is different from each bag Bn. [7 marks] Hence prove that the set of all recursive bags cannot be recursively enumerated. [3 marks] (b) A bag B is finite if there is X ∈ N such that fB(x) = 0 for all x > X. Show that the set of all finite bags is recursively enumerable. [10 marks] 5 [TURN OVER CST.96.11.6 10 Numerical Analysis I Let x ∗ be the floating-point representation of a number x. Define the absolute error and relative error in representing x by x ∗ . How are these errors related? [3 marks] Let x1, x2 be two numbers. Find expressions for (a) the absolute error in representing x1 + x2 (b) the relative error in representing x1.x2 (where “.” denotes multiplication) [4 marks] Assume that the numbers 1 and 2 are represented exactly. Find an expression for the absolute error in calculating 2x + 1. [2 marks] In an iterative calculation the number y is an improved value of x, derived from the assignments p := x/2 + 1 q := x − 2 y := p + 1/q If εx is the absolute error in representing x, find an expression for the absolute error εy in representing y. [6 marks] What is the approximate relative error δy in representing y when x = 2.01? [5 marks] 6 CST.96.11.7 11 Graphics Describe a quad-tree encoding method for greyscale images. [6 marks] Given the following greyscale image, draw a diagram showing how it would be encoded using your method from the previous part. 33 39 43 72 34 54 64 81 42 54 71 83 60 64 77 89 [4 marks] An image processing package allows the user to design 3 × 3 convolution filters. Design 3 × 3 filters to perform the following tasks: (a) blurring [2 marks] (b) edge detection of vertical edges. What are the differences between optimistic and pessimistic mechanisms of concurrency control (with particular reference to the ACID properties)? [4 marks] What factors determine whether an optimistic or pessimistic concurrency control policy is appropriate for a transaction processing system? [4 marks] With reference to these factors, state whether an optimistic or pessimistic policy would lead to a more efficient system in the following cases: (a) a flight booking system (b) a police criminal record database (c) a banking transaction processing system (d) a database maintaining patient records [4 marks] 1 [TURN OVER CST.95.3.2 2 Further Modula-3 Synchronisation of threads in Modula-3 is achieved through the use of mutexes and condition variables. An alternative scheme would be to use Dijkstra semaphores. A semaphore has a hidden value (usually set to 1 initially) and two atomic operations: wait (sometimes called P) decrements the stored value. If the result is negative, the thread is suspended; otherwise it continues. signal (sometimes called V) increments the value. If there are any other threads suspended while waiting for the semaphore, one of them is allowed to continue. Write an interface Semaphore defining an opaque object type T with init, signal and wait methods. [5 marks] Sketch an implementation of the Semaphore module giving a concrete revelation of T and implementing appropriate default methods. [10 marks] Show how the interface and implementation could be extended to derive a sub-type of T with an extra method, try, which works like wait but returns a BOOLEAN value instead of blocking. In the normal case, try should return TRUE but when the thread would have been suspended, the value in the semaphore is left unchanged and it should return FALSE. [5 marks] 3 Regular Languages and Finite Automata Describe how to derive from any regular expression a deterministic finite automaton describing the same language. [15 marks] Justify the claim that the resulting automaton does describe the same language. [5 marks] 2 CST.95.3.3 4 Compiler Construction Construct the characteristic finite state machine for the following grammar. S → A B eof A → A B | B a B → (A) | b [6 marks] Explain what is meant by the FOLLOW set for a non-terminal symbol in a grammar, and derive the FOLLOW sets for A and B in the above grammar. [4 marks] Construct, with explanation, the SLR(1) action and goto matrices for the above grammar. [5 marks] Illustrate how the SLR(1) parsing algorithm works for this grammar by showing the successive states of the parser stack and input stream while parsing b a b ( b a ) eof [5 marks] 3 [TURN OVER CST.95.3.4 5 Data Structures and Algorithms For each of the following situations identify one data structure or algorithm that it would be sensible to use, and another that would in principle achieve the desired result but which would have significant disadvantages. You may identify standard methods by name and need not describe in detail how they work, but should make it clear what properties the schemes that you identify have that make some of them more appropriate than others. (a) You need to represent some (directed) graphs where when a graph has N vertices it will have around N log N edges. The number of vertices, N, may become quite large. [4 marks] (b) In the process of rendering a graphical image you have already sorted all the objects that have to be drawn with an ordering based on their distance from the viewpoint. Now the image has been changed slightly so that you can start to display the next frame of the video sequence, so all the distances have changed, and you need to sort the objects again. Please provide the code for each step in the Jupyter Notebook and a short description of every part. Do not make any errors.Dataset Name: “Heart Attack Analysis & Prediction Dataset”Dataset: https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-datasetNeed to solve the questions are- Data analysis:Data SelectionData Preprocessing (Missing Data, Invalid Values, Remove Duplicates, Formatting)Data Visualizationcorrelation matrix, heat mapData TransformationData ModelingCurse of Dimensionalitymulticollinearity Prediction Model:Linear regression Computer Science Engineering & Technology Python Programming COMPUTER S 822196B6 Share QuestionEmailCopy link Comments (0)


