Slides for the talk on MLflow for the ML lifecycle management.

## About

Researcher, educator and solver of computationally-intensive mathematical problems. I currently work as a Senior Data Science Developer Advocate at Databricks where I work on Data Science and Machine Learning. Prior to this, I worked as a Computational Scientist at Virginia Tech (2014 - 2020), where I had the privilege of working with some great minds and state-of-the-art science.

## Deep Learning at Databricks

### MLflow, Feature Store, Distributed Training, AutoML

Slides for the Big Data Symposium

## Machine Learning at Scale

### Talk given at the ADBIS workshop

Slides for the talk on End-to-End Enterprise Machine Learning.

## Topic Modeling with PySpark and Feature Store

A Databricks post that I authored on creating a topic modeling pipeline with PySpark and the Databricks Feature Store:

## Using Bayesian Hierarchical Models to Infer the Disease Parameters of COVID-19

### Bayesian Modeling with PyMC3

In a previous post (https://lnkd.in/dZvmsRm), I looked at the available data for the infected cases in the United States as a time-series, modeling this as a compartmental probabilistic model and inferring the disease parameters such as R0 using Bayesian estimation. However, we can use the case counts from several countries and use Bayesian hierarchical models to extend this work and better estimate R0. In this post I illustrate how we can do exactly that using PyMC3.
[Read More]

## Maintainable HPC with Python and C++

### NCAR presentation

Slides from my presentation at the “Improving Scientific Software” conference at the National Center for Atmospheric Research (NCAR).

## Bayesian Modeling of the Temporal Dynamics of COVID-19 using PyMC3

### Data+AI Summit, Europe 2020

These are the slides for the talk given in the Data Science Lounge at the Data+AI Summit, 2020.
Introduction This post is a demonstration of how to use PyMC3 to infer the disease parameters for COVID-19. PyMC3 is a probablistic programming framework that is used for Bayesian modeling. It accomplishes this through both Markov Chain Monte Carlo (MCMC) and Variational Inference methods. The work here looks at using the currently available data for the infected cases in the United States as a time-series and attempts to model this using a compartmental model.
[Read More]

## Rclone for Data Transfer - Google Drive

### Data backup

Rclone website is a tool for data transfer to and from a variety of sources including your local machine. A few commands for interacting with Google Drive and transferring data to and from a local machine is shown below.
Use the following to setup your remote for Google gdrive
rclone config To list remotes
rclone listremotes remote_google: To list the directories in this drive
rclone lsd remote_google: To list all the files
[Read More]

## Backup of Gitlab repositories

### Reproducible Git

If you ever had to download or backup your Gitlab repositories, you would probably have to do that manually for every repository you own. As of this writing, I had 54 and that was not my idea of a lazy afternoon. So I used the Gitlab API and with the help of Gitlab ‘Private Token’ setup this Python script to do the job for me.
import requests import json import os def get_repo(repo): os.
[Read More]

## Multi-GPU Computing with Pytorch (Draft)

### An overview

1. Introduction Pytorch provides a few options for mutli-GPU/multi-CPU computing or in other words distributed computing. While this is unsurprising for Deep learning, what is pleasantly surprising is the support for general purpose low-level distributed or parallel computing. Those who have used MPI will find this functionality to be familiar. Pytorch can be used for the following scenarios:
Single GPU, single node (multiple CPUs on the same node) Single GPU, multiple nodes Multiple GPUs, single node Multiple GPUs, multiple nodes Pytorch allows ‘Gloo’, ‘MPI’ and ‘NCCL’ as backends for parallelization.
[Read More]