Exambible Professional-Data-Engineer Questions are updated and all Professional-Data-Engineer answers are verified by experts. Once you have completely prepared with our Professional-Data-Engineer exam prep kits you will be ready for the real Professional-Data-Engineer exam without a problem. We have Most up-to-date Google Professional-Data-Engineer dumps study guide. PASSED Professional-Data-Engineer First attempt! Here What I Did.

Free Professional-Data-Engineer Demo Online For Google Certifitcation:


Which of these statements about BigQuery caching is true?

  • A. By default, a query's results are not cached.
  • B. BigQuery caches query results for 48 hours.
  • C. Query results are cached even if you specify a destination table.
  • D. There is no charge for a query that retrieves its results from cache.

Answer: D

When query results are retrieved from a cached results table, you are not charged for the query. BigQuery caches query results for 24 hours, not 48 hours.
Query results are not cached if you specify a destination table.
A query's results are always cached except under certain conditions, such as if you specify a destination table. Reference: https://cloud.google.com/bigquery/querying-data#query-caching


Your company receives both batch- and stream-based event data. You want to process the data using Google Cloud Dataflow over a predictable time period. However, you realize that in some instances data can arrive late or out of order. How should you design your Cloud Dataflow pipeline to handle data that is late or out of order?

  • A. Set a single global window to capture all the data.
  • B. Set sliding windows to capture all the lagged data.
  • C. Use watermarks and timestamps to capture the lagged data.
  • D. Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.

Answer: B


Google Cloud Bigtable indexes a single value in each row. This value is called the .

  • A. primary key
  • B. unique key
  • C. row key
  • D. master key

Answer: C

Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, allowing you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key.
Reference: https://cloud.google.com/bigtable/docs/overview


Your financial services company is moving to cloud technology and wants to store 50 TB of financial timeseries data in the cloud. This data is updated frequently and new data will be streaming in all the time. Your company also wants to move their existing Apache Hadoop jobs to the cloud to get insights into this data.
Which product should they use to store the data?

  • A. Cloud Bigtable
  • B. Google BigQuery
  • C. Google Cloud Storage
  • D. Google Cloud Datastore

Answer: A

Reference: https://cloud.google.com/bigtable/docs/schema-design-time-series


You work for a bank. You have a labelled dataset that contains information on already granted loan application and whether these applications have been defaulted. You have been asked to train a model to predict default rates for credit applicants.
What should you do?

  • A. Increase the size of the dataset by collecting additional data.
  • B. Train a linear regression to predict a credit default risk score.
  • C. Remove the bias from the data and collect applications that have been declined loans.
  • D. Match loan applicants with their social profiles to enable feature engineering.

Answer: B


Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of datA. Assuming that all expiring logs will be archived correctly, where should you store data that is subject to that mandate?

  • A. Encrypted on Cloud Storage with user-supplied encryption key
  • B. A separate decryption key will be given to each authorized user.
  • C. In a BigQuery dataset that is viewable only by authorized personnel, with the Data Access log used to provide the auditability.
  • D. In Cloud SQL, with separate database user names to each use
  • E. The Cloud SQL Admin activity logs will be used to provide the auditability.
  • F. In a bucket on Cloud Storage that is accessible only by an AppEngine service that collects user information and logs the access before providing a link to the bucket.

Answer: B


Which of these statements about exporting data from BigQuery is false?

  • A. To export more than 1 GB of data, you need to put a wildcard in the destination filename.
  • B. The only supported export destination is Google Cloud Storage.
  • C. Data can only be exported in JSON or Avro format.
  • D. The only compression option available is GZIP.

Answer: C

Data can be exported in CSV, JSON, or Avro format. If you are exporting nested or repeated data, then CSV format is not supported.
Reference: https://cloud.google.com/bigquery/docs/exporting-data


You architect a system to analyze seismic data. Your extract, transform, and load (ETL) process runs as a series of MapReduce jobs on an Apache Hadoop cluster. The ETL process takes days to process a data set because some steps are computationally expensive. Then you discover that a sensor calibration step has been omitted. How should you change your ETL process to carry out sensor calibration systematically in the future?

  • A. Modify the transformMapReduce jobs to apply sensor calibration before they do anything else.
  • B. Introduce a new MapReduce job to apply sensor calibration to raw data, and ensure all other MapReduce jobs are chained after this.
  • C. Add sensor calibration data to the output of the ETL process, and document that all users need to apply sensor calibration themselves.
  • D. Develop an algorithm through simulation to predict variance of data output from the last MapReduce job based on calibration factors, and apply the correction to all data.

Answer: A


You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the Project Owner. You want to automate these jobs by taking nightly batch files containing non-public information from Google Cloud Storage, processing them with a Spark Scala job on a Google Cloud Dataproc cluster, and depositing the results into Google BigQuery.
How should you securely run this workload?

  • A. Restrict the Google Cloud Storage bucket so only you can see the files
  • B. Grant the Project Owner role to a service account, and run the job with it
  • C. Use a service account with the ability to read the batch files and to write to BigQuery
  • D. Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery

Answer: B


You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data. How can you adjust your application design?

  • A. Re-write the application to load accumulated data every 2 minutes.
  • B. Convert the streaming insert code to batch load for individual messages.
  • C. Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts.
  • D. Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long.

Answer: A


The for Cloud Bigtable makes it possible to use Cloud Bigtable in a Cloud Dataflow pipeline.

  • A. Cloud Dataflow connector
  • B. DataFlow SDK
  • C. BiqQuery API
  • D. BigQuery Data Transfer Service

Answer: A

The Cloud Dataflow connector for Cloud Bigtable makes it possible to use Cloud Bigtable in a Cloud Dataflow pipeline. You can use the connector for both batch and streaming operations.
Reference: https://cloud.google.com/bigtable/docs/dataflow-hbase


Your globally distributed auction application allows users to bid on items. Occasionally, users place identical bids at nearly identical times, and different application servers process those bids. Each bid event contains the item, amount, user, and timestamp. You want to collate those bid events into a single location in real time to determine which user bid first. What should you do?

  • A. Create a file on a shared file and have the application servers write all bid events to that fil
  • B. Process the file with Apache Hadoop to identify which user bid first.
  • C. Have each application server write the bid events to Cloud Pub/Sub as they occu
  • D. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL.
  • E. Set up a MySQL database for each application server to write bid events int
  • F. Periodically query each of those distributed MySQL databases and update a master MySQL database with bid event information.
  • G. Have each application server write the bid events to Google Cloud Pub/Sub as they occu
  • H. Use a pull subscription to pull the bid events using Google Cloud Dataflo
  • I. Give the bid for each item to the userIn the bid event that is processed first.

Answer: C


You have some data, which is shown in the graphic below. The two dimensions are X and Y, and the shade of each dot represents what class it is. You want to classify this data accurately using a linear algorithm.
Professional-Data-Engineer dumps exhibit
To do this you need to add a synthetic feature. What should the value of that feature be?

  • A. X^2+Y^2
  • B. X^2
  • C. Y^2
  • D. cos(X)

Answer: D


You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users’ privacy?

  • A. Grant the consultant the Viewer role on the project.
  • B. Grant the consultant the Cloud Dataflow Developer role on the project.
  • C. Create a service account and allow the consultant to log on with it.
  • D. Create an anonymized sample of the data for the consultant to work with in a different project.

Answer: C


You have enabled the free integration between Firebase Analytics and Google BigQuery. Firebase now automatically creates a new table daily in BigQuery in the format app_events_YYYYMMDD. You want to query all of the tables for the past 30 days in legacy SQL. What should you do?

  • A. Use the TABLE_DATE_RANGE function
  • B. Use the WHERE_PARTITIONTIME pseudo column
  • D. Use SELECT IF.(date >= YYYY-MM-DD AND date <= YYYY-MM-DD

Answer: A



You have a data pipeline that writes data to Cloud Bigtable using well-designed row keys. You want to monitor your pipeline to determine when to increase the size of you Cloud Bigtable cluster. Which two actions can you take to accomplish this? Choose 2 answers.

  • A. Review Key Visualizer metric
  • B. Increase the size of the Cloud Bigtable cluster when the Read pressure index is above 100.
  • C. Review Key Visualizer metric
  • D. Increase the size of the Cloud Bigtable cluster when the Write pressure index is above 100.
  • E. Monitor the latency of write operation
  • F. Increase the size of the Cloud Bigtable cluster when there is a sustained increase in write latency.
  • G. Monitor storage utilizatio
  • H. Increase the size of the Cloud Bigtable cluster when utilization increases above 70% of max capacity.
  • I. Monitor latency of read operation
  • J. Increase the size of the Cloud Bigtable cluster of read operations take longer than 100 ms.

Answer: AC


Which action can a Cloud Dataproc Viewer perform?

  • A. Submit a job.
  • B. Create a cluster.
  • C. Delete a cluster.
  • D. List the jobs.

Answer: D

A Cloud Dataproc Viewer is limited in its actions based on its role. A viewer can only list clusters, get cluster details, list jobs, get job details, list operations, and get operation details.
Reference: https://cloud.google.com/dataproc/docs/concepts/iam#iam_roles_and_cloud_dataproc_operations_summary


You are running a pipeline in Cloud Dataflow that receives messages from a Cloud Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum CPU utilization. Which two actions can you take to increase performance of your pipeline? (Choose two.)

  • A. Increase the number of max workers
  • B. Use a larger instance type for your Cloud Dataflow workers
  • C. Change the zone of your Cloud Dataflow pipeline to run in us-central1
  • D. Create a temporary table in Cloud Bigtable that will act as a buffer for new dat
  • E. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Bigtable to BigQuery
  • F. Create a temporary table in Cloud Spanner that will act as a buffer for new dat
  • G. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery

Answer: BE


Which of the following is NOT a valid use case to select HDD (hard disk drives) as the storage for Google Cloud Bigtable?

  • A. You expect to store at least 10 TB of data.
  • B. You will mostly run batch workloads with scans and writes, rather than frequently executing random reads of a small number of rows.
  • C. You need to integrate with Google BigQuery.
  • D. You will not use the data to back a user-facing or latency-sensitive application.

Answer: C

For example, if you plan to store extensive historical data for a large number of remote-sensing devices and then use the data to generate daily reports, the cost savings for HDD storage may justify the performance tradeoff. On the other hand, if you plan to use the data to display a real-time dashboard, it probably would not make sense to use HDD storage—reads would be much more frequent in this case, and reads are much slower with HDD storage.
Reference: https://cloud.google.com/bigtable/docs/choosing-ssd-hdd


Which of the following statements is NOT true regarding Bigtable access roles?

  • A. Using IAM roles, you cannot give a user access to only one table in a project, rather than all tables in a project.
  • B. To give a user access to only one table in a project, grant the user the Bigtable Editor role for that table.
  • C. You can configure access control only at the project level.
  • D. To give a user access to only one table in a project, you must configure access through your application.

Answer: B

For Cloud Bigtable, you can configure access control at the project level. For example, you can grant the ability to:
Read from, but not write to, any table within the project.
Read from and write to any table within the project, but not manage instances. Read from and write to any table within the project, and manage instances. Reference: https://cloud.google.com/bigtable/docs/access-control


You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are running on Compute Engine instances. You need to encrypt data at rest with encryption keys that you can create, rotate, and destroy as needed. What should you do?

  • A. Create a dedicated service account, and use encryption at rest to reference your data stored in your Compute Engine cluster instances as part of your API service calls.
  • B. Create encryption keys in Cloud Key Management Servic
  • C. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
  • D. Create encryption keys locall
  • E. Upload your encryption keys to Cloud Key Management Servic
  • F. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
  • G. Create encryption keys in Cloud Key Management Servic
  • H. Reference those keys in your API service calls when accessing the data in your Compute Engine cluster instances.

Answer: C


Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set. You want to increase the AUC of the model. What should you do?

  • A. Perform hyperparameter tuning
  • B. Train a classifier with deep neural networks, because neural networks would always beat SVMs
  • C. Deploy the model and measure the real-world AUC; it’s always higher because of generalization
  • D. Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC

Answer: D


MJTelco is building a custom interface to share data. They have these requirements:
Professional-Data-Engineer dumps exhibit They need to do aggregations over their petabyte-scale datasets.
Professional-Data-Engineer dumps exhibit They need to scan specific time range rows with a very fast response time (milliseconds). Which combination of Google Cloud Platform products should you recommend?

  • A. Cloud Datastore and Cloud Bigtable
  • B. Cloud Bigtable and Cloud SQL
  • C. BigQuery and Cloud Bigtable
  • D. BigQuery and Cloud Storage

Answer: C


P.S. Certshared now are offering 100% pass ensure Professional-Data-Engineer dumps! All Professional-Data-Engineer exam questions have been updated with correct answers: https://www.certshared.com/exam/Professional-Data-Engineer/ (239 New Questions)