QUESTION 91

Consider these itemsets:

(hat, scarf, coat)

(hat, scarf, coat, gloves)

(hat, scarf, gloves)

(hat, gloves)

(scarf, coat, gloves)

What is the confidence of the rule (hat, scarf) -> gloves?

 A. 66% B. 40% C. 50% D. 60%

QUESTION 92

Refer to the exhibit. You have run a linear regression model against your data, and have plotted true outcome versus predicted outcome. The R-squared of your model is 0.75. What is your assessment of the model?

 A. The R-squared may be biased upwards by the extreme-valued outcomes. Remove them and refit to get a better idea of the model’s quality over typical data. B. The R-squared is good. The model should perform well. C. The extreme-valued outliers may negatively affect the model’s performance. Remove them to see if the R-squared improves over typical data. D. The observations seem to come from two different populations, but this model fits them both equally well.

QUESTION 93

You are provided four different datasets. Initial analysis on these datasets show that they have identical mean, variance and correlation values. What should your next step in the analysis be?

 A. Visualize the data to further explore the characteristics of each data set B. Select one of the four datasets and begin planning and building a model C. Combine the data from all four of the datasets and begin planning and bulding a model D. Recalculate the descriptive statistics since they are unlikely to be identical for each dataset

QUESTION 94

Refer to the exhibit. What provides the decision tree for predicting whether or not someone is a good or bad credit risk. What would be the assigned probability, p(good), of a single male with no known savings?

 A. 0.83 B. 0 C. 0.498 D. 0.6

QUESTION 95

You are analyzing data in order to build a classifier model. You discover non-linear data and discontinuities that will affect the model. Which analytical method would you recommend?

 A. Decision Trees B. Logistic Regression C. ARIMA D. Linear Regression

QUESTION 96

Which SQL OLAP extension provides all possible grouping combinations?

 A. CUBE B. ROLLUP C. UNION ALL D. CROSS JOIN

QUESTION 97

Your colleague, who is new to Hadoop, approaches you with a question. They want to know how best to access their data. This colleague has previously worked extensively with SQL and databases. Which query interface would you recommend?

 A. Hive B. Pig C. Howl D. HBase

QUESTION 98

What is the format of the output from the Map function of MapReduce?

 A. Key-value pairs B. Binary respresentation of keys concatenated with structured data C. Compressed index D. Unique key record and separate records of all possible values

QUESTION 99

You have used k-means clustering to classify behavior of 100, 000 customers for a retail store. You decide to use household income, age, gender and yearly purchase amount as measures. You have chosen to use 8 clusters and notice that 2 clusters only have 3 customers assigned. What should you do?

 A. Decrease the number of clusters B. Increase the number of clusters C. Decrease the number of measures used D. Identify additional measures to add to the analysis

QUESTION 100

Under which circumstance do you need to implement N-fold cross-validation after creating a regression model?

 A. There is not enough data to create a test set. B. The data is unformatted. C. There are missing values in the da ta. D. There are categorical variables in the model.