Accelerating MLOps at Bayer Crop Science with Kubeflow Pipelines and Amazon SageMaker

 Accelerating MLOps at Bayer Crop Science with Kubeflow Pipelines and Amazon SageMaker

This is a guest post by the data science team at Bayer Crop Science. 

Farmers have always collected and evaluated a large amount of data with each growing season: seeds planted, crop protection inputs applied, crops harvested, and much more. The rise of data science and digital technologies provides farmers with a wealth of new information. At Bayer Crop Science, we use AI and machine learning (ML) to help farmers achieve more bountiful and sustainable harvests. We also use data science to accelerate our research and development process; create efficiencies in production, operations, and supply chain; and improve customer experience.

To evaluate potential products, like a short-stature line of corn or an advanced herbicide, Bayer scientists often plant a small trial in a greenhouse or field. We then use advanced sensors and analytical models to evaluate the experimental results. For example, we might fly an unmanned aerial vehicle over a field and use computer vision models to count the number of plants or measure their height. In this way, we’ve collected data from millions of test plots around the world and used them to train models that can determine the size and position of every plant in our image library.

Analytical models like these are powerful but require effort and skill to design and train effectively. [email protected], the ML engineering team at Bayer Crop Science, has made these techniques more accessible by integrating Amazon SageMaker with open-source tools like KubeFlow Pipelines to create reproducible templates for analytical model training, hosting, and access. These resources help standardize how our data scientists interact with SageMaker services. They also make it easier to meet Bayer-specific requirements, such as using multiple AWS accounts and resource tags.

Standardizing the ML workflow for Bayer Crop Science

Data science teams at Bayer Crop Science follow a common pattern to develop and deploy ML models:

  1. A data scientist develops model and training code in a SageMaker notebook or other coding environment running in a project-specific AWS account.
  2. A data scientist trains the model on data stored in Amazon Simple Storage Service (Amazon S3).
  3. A data scientist partners with an ML engineer to deploy the trained model as an inference service.
  4. An ML engineer creates the API proxies required for applications outside of the project-specific account to call the inference service.
  5. ML and other engineers perform additional steps to meet Bayer-specific infrastructure and security requirements.

To automate this process, our team transformed the steps into a reusable, parameterized workflow using KubeFlow Pipelines (KFP). Each step of a workflow (a KFP component) is associated with a Docker container and connected via the K


Source - Continue Reading:


Related post