Exploring the Artificial Intelligence Virtual Experience Program: A Journey of Learning and Skill Development

10 min readMay 28, 2023

“The more we explore the vast realm of data science and machine learning, the more we realize the limitless possibilities they hold. It is through continuous learning and practical experience that we unravel the true power of AI, transforming data into actionable insights that shape our world.”

As a passionate data science and machine learning enthusiast, my thirst for knowledge and continuous growth in the field knows no bounds. I am always on the lookout for resources and certifications that can enrich my understanding and expertise. Last week, I stumbled upon an exceptional opportunity — the Cognizant Artificial Intelligence Virtual Experience Program on Forage. Little did I know that this virtual experience would propel me into a world of immersive learning and hands-on practice.

Throughout the program, I delved into various aspects of data science and machine learning, acquiring valuable skills and knowledge that would shape my career. From data analysis and Python programming to data visualization, modeling, and beyond, each module offered a unique perspective and allowed me to hone my abilities. Moreover, the program also emphasized essential elements such as problem statement formulation, model interpretation, effective communication, machine learning engineering, development, quality assurance, and evaluation techniques.

In this blog series , I aim to take you on a journey through my experience in the Cognizant Artificial Intelligence Virtual Experience Program. Together, we will explore the concepts I learned, the challenges I faced, and the insights I gained throughout the program. Join me as I unravel the intricacies of data science, machine learning, and the practical applications that go hand in hand with them.

Task One: Exploratory Data Analysis

Problem Statement :

Gala Groceries approached Cognizant to help them with a supply chain issue. Groceries are highly perishable items. If you overstock, you are wasting money on excessive storage and waste, but if you understock, then you risk losing customers. They want to know how to better stock the items that they sell.

This is a high-level business problem and will require you to dive into the data in order to formulate some questions and recommendations to the client about what else we need in order to answer that question.

Once you’re done with your analysis, we need you to summarize your findings and provide some suggestions as to what else we need in order to fulfill their business problem. Please draft an email containing this information to the Data Science team leader to review before we send it to the client.

What is Exploratory Data Analysis?

As I was contemplating what could be the maiden topic I should begin writing my blog with, in no time EDA popped up to…

towardsdatascience.com

Dataset — sample_sale_data.csv

EDA walkthrough —

EDA Walkthrough

This notebook covers Setup, data loading, descriptive statistics, Visualization colab.research.google.com

The challenging task here was communication. The client does not understand the codes you write and the output you get from it because he is not data scientist. It was my work to summarize findings in a concise and business-friendly manner within an email to the Data Science team leader.

Here is the email draft of the task:

Dear DataScience Team Leader, 

I received the sample dataset from the Data Engineering team and I’ve been analyzing the sample on behalf of the Data Science team. 

I found the following insights as part of the analysis: 
  1. Fruit & vegetables are the 2 most frequently bought product categories 
  2. Non-members are the most frequent buyers within the store 
  3. Cash is the most frequently used payment method 
  4. 11am is the busiest hour with regards to number of transactions
 
As a reminder, the client indicated that they wanted to know the following: “How to better stock the items that they sell.” 
With respect to this business question, my recommendations are as follows: 
  1. This is a very broad statement and in order to tackle this with better accuracy, we need to identify a specific problem statement that the business would like to solve. For example, can we predict the demand of products on an hourly basis in order to procure products more intelligently? 
  2. We need more data. The current sample only covers 7 days and 1 store. 
  3. Based on the problem statement that we move forward with, we will need more datasets to help describe the outcome that we’re trying to model. For example, if we’re modeling demand for products, we may want to include information about stock levels or weather conditions. 

Best regards, 
Vedant Dhote

Task 2: Data Modeling

Problem Statement:

“Can we accurately predict the stock levels of products based on sales data and sensor data on an hourly basis in order to more intelligently procure products from our suppliers?”

The client has agreed to share more data in the form of sensor data. They use sensors to measure temperature storage facilities where products are stored in the warehouse, and they also use stock levels within the refrigerators and freezers in store.

It is your task to look at the data model diagram that has been provided by the Data Engineering team and to decide on what data you’re going to use from the data available. In addition, we need you to create a strategic plan as to how you’ll use this data to complete the work to answer the problem statement.

Data Modeling - Part 2

UX Knowledge Base Sketch #66

uxknowledgebase.com

This was a simple task as compare to task 1. Here we derive strategic plan of the project:

Task 3: Model Building and Interpretation

Problem Statement:

The client has provided 3 datasets, it is now your job to combine, transform and model these datasets in a suitable way to answer the problem statement that the business has requested.

Most importantly, once the modeling process is complete, we need you to communicate your work and analysis in the form of a single PowerPoint slide, so that we can present the results back to the business. The key here is to use business-friendly language and to explain your results in a way that the business will understand.

Additional Dataset : sales.csv, sensor_stock_levels.csv , sensor_storage_temperature.csv

How to Build a Machine Learning Model - Seldon

Machine learning models are powerful tools used to efficiently and effectively perform vital tasks and solve complex…

www.seldon.io

In the model building process, I start by gathering and preprocessing the relevant dataset, ensuring its quality and integrity. Next, I carefully selected appropriate features and engineer new ones if necessary to capture the underlying patterns and relationships. Then, I split the dataset into training, validation, and testing sets for evaluation purposes.

After training and tuning algorithms, evaluating performance, ensuring interpretability I came up with these insights:

The product categories were not that important
The unit price and temperature were important in predicting stock
The hour of day was also important for predicting stock

Model Walkthrough

Build model by gathering and preprocessing data, selecting features, training and tuning algorithm, evaluating performance, ensuring interoperability colab.research.google.com

Task 4: Machine Learning Production

Problem Statement:

To build the foundation for this machine learning use case, they want to implement a first version of the algorithm into production. In the current state, as a Python notebook, this is not suitable to productionize a machine learning model.

Therefore, as the Data Scientist that created this algorithm, it is your job to prepare a Python module that contains code to train a model and output the performance metrics when the file is run.

Working with Modules in Python: Must Known Fundamentals for Data Scientists

This article was published as a part of the Data Science Blogathon Nowadays, the Python programming language becomes…

www.analyticsvidhya.com

Step 1: Plan

Good quality code should be planned and should follow a uniform and clear structure.

Step 2: Write

After planning the module that I created python file and included plenty of comments and documentation, because the ML engineering team is not the team that wrote this code.

The python module code:

# ------- BEFORE STARTING - SOME BASIC TIPS
# You can add a comment within a Python file by using a hashtag '#'
# Anything that comes after the hashtag on the same line, will be considered
# a comment and won't be executed as code by the Python interpreter.

# --- 1) IMPORTING PACKAGES
# The first thing you should always do in a Python file is to import any
# packages that you will need within the file. This should always go at the top
# of the file
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import StandardScaler

# --- 2) DEFINE GLOBAL CONSTANTS
# Constants are variables that should remain the same througout the entire running
# of the module. You should define these after the imports at the top of the file.
# You should give global constants a name and ensure that they are in all upper
# case, such as: UPPER_CASE

# K is used to define the number of folds that will be used for cross-validation
K = 10

# Split defines the % of data that will be used in the training sample
# 1 - SPLIT = the % used for testing
SPLIT = 0.75

# --- 3) ALGORITHM CODE
# Next, we should write our code that will be executed when a model needs to be 
# trained. There are many ways to structure this code and it is your choice 
# how you wish to do this. The code in the 'module_helper.py' file will break
# the code down into independent functions, which is 1 option. 
# Include your algorithm code in this section below:

# Load data
def load_data(path: str = "/path/to/csv/Forage - Cognizant AI Program/Task 4/Resources/"):
    """
    This function takes a path string to a CSV file and loads it into
    a Pandas DataFrame.

    :param      path (optional): str, relative path of the CSV file

    :return     df: pd.DataFrame
    """

    df = pd.read_csv(f"{path}")
    df.drop(columns=["Unnamed: 0"], inplace=True, errors='ignore')
    return df

# Create target variable and predictor variables
def create_target_and_predictors(
    data: pd.DataFrame = None, 
    target: str = "estimated_stock_pct"
):
    """
    This function takes in a Pandas DataFrame and splits the columns
    into a target column and a set of predictor variables, i.e. X & y.
    These two splits of the data will be used to train a supervised 
    machine learning model.

    :param      data: pd.DataFrame, dataframe containing data for the 
                      model
    :param      target: str (optional), target variable that you want to predict

    :return     X: pd.DataFrame
                y: pd.Series
    """

    # Check to see if the target variable is present in the data
    if target not in data.columns:
        raise Exception(f"Target: {target} is not present in the data")
    
    X = data.drop(columns=[target])
    y = data[target]
    return X, y

# Train algorithm
def train_algorithm_with_cross_validation(
    X: pd.DataFrame = None, 
    y: pd.Series = None
):
    """
    This function takes the predictor and target variables and
    trains a Random Forest Regressor model across K folds. Using
    cross-validation, performance metrics will be output for each
    fold during training.

    :param      X: pd.DataFrame, predictor variables
    :param      y: pd.Series, target variable

    :return
    """

    # Create a list that will store the accuracies of each fold
    accuracy = []

    # Enter a loop to run K folds of cross-validation
    for fold in range(0, K):

        # Instantiate algorithm and scaler
        model = RandomForestRegressor()
        scaler = StandardScaler()

        # Create training and test samples
        X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=SPLIT, random_state=42)

        # Scale X data, we scale the data because it helps the algorithm to converge
        # and helps the algorithm to not be greedy with large values
        scaler.fit(X_train)
        X_train = scaler.transform(X_train)
        X_test = scaler.transform(X_test)

        # Train model
        trained_model = model.fit(X_train, y_train)

        # Generate predictions on test sample
        y_pred = trained_model.predict(X_test)

        # Compute accuracy, using mean absolute error
        mae = mean_absolute_error(y_true=y_test, y_pred=y_pred)
        accuracy.append(mae)
        print(f"Fold {fold + 1}: MAE = {mae:.3f}")

    # Finish by computing the average MAE across all folds
    print(f"Average MAE: {(sum(accuracy) / len(accuracy)):.2f}")
 
# --- 4) MAIN FUNCTION
# Your algorithm code should contain modular code that can be run independently.
# You may want to include a final function that ties everything together, to allow
# the entire pipeline of loading the data and training the algorithm to be run all
# at once

# Execute training pipeline
def run():
    """
    This function executes the training pipeline of loading the prepared
    dataset from a CSV file and training the machine learning model

    :param

    :return
    """

    # Load the data first
    df = load_data()

    # Now split the data into predictors and target variables
    X, y = create_target_and_predictors(data=df)

    # Finally, train the machine learning model
    train_algorithm_with_cross_validation(X=X, y=y)

Task Five: Quality Assurance

Evaluating the production machine learning model to ensure quality results.

After this the ML engineering team has taken my Python module and deployed the algorithm into production along with the DevOps, which is great!

This was my experience and from this virtual program I learned :

How Python can be used to conduct exploratory data analysis
The importance of communication within your role to explain what you have found
How to plan what data is required to answer business questions using a data model
How to communicate your strategic plan to your Data Science team leader
How to apply machine learning to combine, transform and model data sets to answer the client’s question
How to communicate your key findings to the client
How Python is used in machine learning to provide greater business value to Gala Groceries
Guidance is provided on how to plan and write a Python module
How to review your Python module and identify how you can improve your algorithm

If you have interest in data science and machine learning try this Cognizant Artificial Intelligence Virtual Experience Program .

Exploring the Artificial Intelligence Virtual Experience Program: A Journey of Learning and Skill Development

Task One: Exploratory Data Analysis

What is Exploratory Data Analysis?

As I was contemplating what could be the maiden topic I should begin writing my blog with, in no time EDA popped up to…

EDA Walkthrough

Task 2: Data Modeling

Data Modeling - Part 2

UX Knowledge Base Sketch #66

Task 3: Model Building and Interpretation

How to Build a Machine Learning Model - Seldon

Machine learning models are powerful tools used to efficiently and effectively perform vital tasks and solve complex…

Model Walkthrough

Task 4: Machine Learning Production

Working with Modules in Python: Must Known Fundamentals for Data Scientists

This article was published as a part of the Data Science Blogathon Nowadays, the Python programming language becomes…

Task Five: Quality Assurance

Artificial Intelligence

Written by Vedant Dhote

No responses yet