Free Databricks Certified Machine Learning Associate Exam Databricks-Machine-Learning-Associate Exam Practice Test

UNLOCK FULL
Databricks-Machine-Learning-Associate Exam Features
In Just $59 You can Access
  • All Official Question Types
  • Interactive Web-Based Practice Test Software
  • No Installation or 3rd Party Software Required
  • Customize your practice sessions (Free Demo)
  • 24/7 Customer Support
Page: 1 / 15
Total Questions: 74
  • Which of the following is a benefit of using vectorized pandas UDFs instead of standard PySpark UDFs?

    Answer: B Next Question
  • A data scientist has produced three new models for a single machine learning problem. In the past, the solution used just one model. All four models have nearly the same prediction latency, but a machine learning engineer suggests that the new solution will be less time efficient during inference. In which situation will the machine learning engineer be correct?

    Answer: D Next Question
  • A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation when k > 2.Which of the following describes a potential benefit of using a train-validation split over k-fold cross- validation in this scenario?

    Answer: E Next Question
  • Which of the following machine learning algorithms typically uses bagging?

    Answer: C Next Question
  • A machine learning engineering team has a Job with three successive tasks. Each task runs a single notebook. The team has been alerted that the Job has failed in its latest run.Which of the following approaches can the team use to identify which task is the cause of the failure?

    Answer: B Next Question
  • Which statement describes a Spark ML transformer?

    Answer: A Next Question
  • A data scientist has created two linear regression models. The first model uses price as a label variable and the second model uses log(price) as a label variable. When evaluating the RMSE of each model by comparing the label predictions to the actual price values, the data scientist notices that the RMSE for the second model is much larger than the RMSE of the first model.Which of the following possible explanations for this difference is invalid?

    Answer: E Next Question
  • Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?

    Answer: D Next Question
  • A data scientist uses 3-fold cross-validation and the following hyperparameter grid when optimizing model hyperparameters via grid search for a classification problem:Hyperparameter 1: [2, 5, 10]Hyperparameter 2: [50, 100]Which of the following represents the number of machine learning models that can be trained in parallel during this process?

    Answer: D Next Question
  • A data scientist has developed a machine learning pipeline with a static input data set using Spark ML, but the pipeline is taking too long to process. They increase the number of workers in the cluster to get the pipeline to run more efficiently. They notice that the number of rows in the training set after reconfiguring the cluster is different from the number of rows in the training set prior to reconfiguring the cluster.Which of the following approaches will guarantee a reproducible training and test set for each model?

    Answer: B Next Question
Page: 1 / 15
Total Questions: 74