|Perfect Number of Pages to Order||5-10 Pages|
Data Mining Homework Assignment
Answer the following questions: (10 point each)
1. Consider the following definition of an anomaly: An anomaly is an object that is unusually influential in the creation of a data model.
a. Compare this definition to that of the standard model-based definition of an anomaly.
b. For what sizes of data sets (small, medium, or large) is this definition appropriate?
2. In one approach to anomaly detection, objects are represented as points in a multidimensional space, and the points are grouped into successive shells, where each shell represents a layer around a grouping of points, such as a convex hull. An object is an anomaly if it lies in one of the outer shells.
a. To which of the definitions of an anomaly in Section 9.2 is this definition most closely related?
b. Name two problems with this definition of an anomaly.
3. Consider the (relative distance) K-means scheme for outlier detection described in Section 9.5 and the accompanying figure, Figure 9.10.
a. The points at the bottom of the compact cluster shown in Figure 9.10 have a somewhat higher outlier score than those points at the top of the compact cluster. Why?
b. Suppose that we choose the number of clusters to be much larger, e.g., 10. Would the proposed technique still be effective in finding the most extreme outlier at the top of the figure? Why or why not?
c. The use of relative distance adjusts for differences in density. Give an example of where such an approach might lead to the wrong conclusion.
|Excellent (4)||Good (3)||Needs Improvement (2)||Poor (1)|
|Presentation has no spelling or punctuation errors and very few grammatical errors. Very well written and excellent use of business language.||Presentation has a few punctuation or spelling errors. Few grammatical errors that don’t affect meaning. Well written and use of business language is acceptable.||Presentation has some punctuation or spelling errors. Some grammatical errors but generally don’t affect meaning. Language is satisfactory overall but use of more appropriate business language is required.||Presentation has numerous punctuation or spelling errors. Numerous grammatical errors that affect meaning. Language used is unprofessional and not appropriate for business use.|
|Presentation Organization and Formatting||Relevant sections are included, and each is organized effectively. Formatting is professional and consistent and includes all required elements.||Relevant sections are included but could be a little better organized. Formatting is acceptable and/or one or two required elements are missing.||Relevant sections are included but organization could be improved. Formatting is inconsistent and/or many elements are missing or not formatted appropriately.||Sections are missing and/or poorly organized. Formatting is messy and unprofessional.|
|Presentation Content/Thinking||Presentation Content is comprehensive and detailed. Thinking and ideas are presented clearly, are engaging and thorough.||Presentation Content is sufficient. Thinking and ideas are clear but could be expanded to be more thorough.||Presentation Content is fairly basic and/or thinking and ideas lack some clarity and depth.||Presentation Content is very basic and thinking and ideas are superficial and lack clarity.|
|To get help with this assignment and many more, please log on to https://academicessayist.com to get in touch with our team|
Data Mining Homework Assignment