Concepts

To perform batch scoring on Azure, you can utilize the batch endpoint in Azure Machine Learning service. Batch scoring allows you to apply your trained models to large datasets in a distributed and parallel manner, maximizing efficiency. In this article, we will explore how to invoke the batch endpoint to start a batch scoring job as part of designing and implementing a data science solution on Azure.

Prerequisites

Before proceeding, ensure that you have completed the following prerequisites:

  • Create an Azure Machine Learning workspace
  • Create a compute target
  • Deploy a model

If you haven’t completed these prerequisites, refer to the relevant Azure documentation for detailed guidance.

Step 1: Set up Authentication

To authenticate your requests, you need to obtain an access token. You can do this by using the Azure Active Directory authentication library (ADAL) to authenticate against Azure Active Directory and obtain an access token. The following code snippet demonstrates how to obtain the access token programmatically using Python:

python
from azureml.core.authentication import AzureCliAuthentication

# Use Azure CLI authentication
auth = AzureCliAuthentication()

# Obtain access token
context = auth.get_authentication_context()
access_token = context.acquire_token(“https://management.core.windows.net/”)

# Use the access token for subsequent requests

Step 2: Prepare the Scoring Script

Create a Python script that defines the scoring logic for your batch scoring job. This script will be executed against each input data record in the batch dataset. Ensure that the script follows the syntax and logic required for your specific use case. Below is an example of a scoring script that imports the necessary modules and performs scoring using a trained model:

python
import joblib
import pandas as pd

def init():
# Load the trained model
global model
model_path = ‘model.pkl’
model = joblib.load(model_path)

def run(input_data):
# Convert input data to pandas DataFrame
data = pd.DataFrame(input_data)

# Perform scoring using the trained model
results = model.predict(data)

# Return the scoring results
return results.tolist()

Step 3: Create a Scoring Environment

To execute the scoring script, you need to define a scoring environment that includes all the necessary dependencies. This can be achieved by creating a Conda environment specification file (environment.yml). The following code snippet demonstrates an example environment.yml file:

yaml
name: scoring_environment
dependencies:
– python=3.8
– pip:
– azureml-core
– azureml-defaults
– pandas
– scikit-learn
– joblib

Step 4: Create a Batch Scoring Job

To create a batch scoring job, you need to define the details such as the input and output dataset, the scoring script, and the scoring environment. The following code snippet illustrates how to create a batch scoring job using Python:

python
from azureml.core import Dataset, Environment, ScriptRunConfig, Workspace

# Load the Azure Machine Learning workspace
workspace = Workspace.from_config()

# Get the input dataset
input_dataset = Dataset.get_by_name(workspace, name=’input_dataset’)

# Get the output dataset
output_dataset = Dataset.get_by_name(workspace, name=’output_dataset’)

# Define the scoring environment
environment = Environment.from_conda_specification(‘scoring_env’, ‘environment.yml’)

# Define the scoring script run configuration
script_run_config = ScriptRunConfig(
source_directory=’path_to_scripts’,
script=’score.py’,
arguments=[‘–input’, input_dataset.as_named_input(‘input_data’), ‘–output’, output_dataset.as_named_output(‘output_data’)],
compute_target=’compute_target’,
environment=environment
)

# Submit the batch scoring job
run = experiment.submit(script_run_config)

Make sure to replace the necessary placeholders with your own values, such as the dataset names, script paths, and compute target.

Step 5: Monitor the Scoring Job

Once the job is submitted, you can monitor its progress using the Azure Machine Learning Studio or programmatically using the Azure Machine Learning SDK. This allows you to track the status, view logs, and retrieve the output results once the job is completed.

Conclusion

In this article, we learned how to invoke the batch endpoint to start a batch scoring job on Azure. By leveraging batch scoring, you can efficiently apply your trained models to large datasets. Remember to follow the steps outlined in this article and refer to the Azure documentation for additional details and advanced configurations.

Answer the Questions in Comment Section

When invoking the batch endpoint to start a batch scoring job in Azure Machine Learning, the data to be scored must already be stored in a registered dataset.

a) True

b) False

Answer: b) False

To invoke the batch endpoint for a scoring job, which HTTP method should be used?

a) GET

b) POST

c) PUT

d) DELETE

Answer: b) POST

The batch scoring job invoked using the Azure Machine Learning Python SDK can only process one file at a time.

a) True

b) False

Answer: b) False

Which parameter is used to specify the compute target for a batch scoring job when invoking the batch endpoint?

a) model

b) input_data_reference

c) experiment_name

d) compute_target

Answer: d) compute_target

When invoking the batch endpoint to start a batch scoring job, the scoring script must be specified. Which file type is supported for the scoring script?

a) .py

b) .txt

c) .csv

d) .json

Answer: a) .py

The scoring script specified when invoking the batch endpoint should contain which mandatory method?

a) preprocess

b) predict

c) postprocess

d) evaluate

Answer: b) predict

In the context of invoking the batch endpoint, what is the purpose of the input_data_reference parameter?

a) It specifies the output location for scoring results.

b) It provides a link to the scoring script.

c) It defines the dataset to be scored.

d) It configures the compute target for the job.

Answer: c) It defines the dataset to be scored.

Which property of the BatchEndpointConfig object is used to specify the output location for scoring results?

a) output_datastore

b) script

c) input_dataset

d) model

Answer: a) output_datastore

Can the batch scoring job invoked using the Azure Machine Learning Python SDK be run locally on the user’s local machine?

a) Yes

b) No

Answer: b) No

The scoring results of the batch scoring job invoked using the Azure Machine Learning Python SDK can be viewed in which type of Azure resource?

a) Virtual Machine

b) Azure Blob Storage

c) Azure Data Factory

d) Azure SQL Database

Answer: b) Azure Blob Storage

0 0 votes
Article Rating
Subscribe
Notify of
guest
28 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Amanda Seppala
1 year ago

This guide really helped me understand how to invoke the batch endpoint for a batch scoring job.

یاسمن كامياران

Awesome post! I was stuck on this for hours.

Hedi Bopp
1 year ago

Can anyone explain the difference between a batch endpoint and a real-time endpoint?

Dobrodum Batig
11 months ago

I’m getting a 400 error when I try to invoke the batch endpoint. Any thoughts?

Elisa Giraud
1 year ago

Thanks! This was exactly what I needed.

Marcus Williams
1 year ago

I was wondering if there is any way to monitor the progress of a batch scoring job?

Scott Graves
1 year ago

Great explanation, cleared up a lot of confusion I had.

Victoria Wong
1 year ago

How do I scale out my batch scoring jobs?

28
0
Would love your thoughts, please comment.x
()
x