Quantitative Workflows: A New Paradigm for Everyone

April 18, 2017

Original Link: https://medium.com/datmo/building-artificial-intelligence-together-65e04a45cd6d

Topics like machine learning, artificial intelligence and data science have been talked about at length over the last few years. But these topics have been around for ages — albeit with names that have changed over the years. The types of problems that developers solve in these fields all fit into what we call Quantitative Workflows — the process of starting with data and deriving insights, actions, and quantitative models.

And the developers that create them? They are the researchers, hobbyists, developers, data scientists, analysts, among many others who collectively are the Quantitative oriented Developers (QoDs).

There are a number of workflows that fit into these paradigms, but the world today needs a lot more QoDs than we currently have.

Naturally much has been written on the democratization of machine learning and artificial intelligence, and getting to that requires QoDs who can implement them. The reasons for it are typically stated as follows: it’s the future, access shouldn’t be limited to a special class of people, its value will be experienced by everyone and it should be created by anyone with the desire. Unfortunately the goal of making quantitative workflows available to everyone still leaves much to be desired.

“ AI is about to create a fantastic amount of opportunities, and these opportunities should not be reserved to Ivy League grads, or to people born in the US. They should be open to all. ”

— François Chollet, Author of Keras

These wonderful ideals are all but realized. Although efforts are being made and resources being built to make these quantitative workflows more accessible, the community lacks a cohesive, coherent, and collaborative platform to transfer knowledge between QoDs, and prospective QoDs.

There are a number of disparate machine learning frameworks that are all growing exponentially.

As QoDs of different backgrounds, fluency, and training in basic statistics, machine learning concepts, and software attack the myriad problems in research and industry they are searching for a common and accessible methods of collaboration.

Today, the biggest pain points QoDs face is a disconnect when they want to work through their quantitative workflows. Below are 3 key parts of the process and the biggest issue in each one.

Tracking & Reproducibility: Finding previous configurations, dependencies, and environments to reproduce or build on top of your own or another person’s work

Collaboration: Sharing with or understanding a model from another person, and tweaking that model to fit custom data and configurations

Deployment: Converting a model to work in production once it’s ready and keeping that deployed model up to date with new observations

Let’s go through a quick example using a computer vision algorithm for face recognition. There are several computer vision approaches to identifying someone’s face in an image. To leverage one of these approaches a QoD must:

  1. reproduce the same environment that was used in the approach (usually all they have is a code repository, not the full environment). They might find some direction on this through a readme, description or blog written by the author of the approach, but the reproducibility of an approach is unclear.
  2. see how the approach works on the type of data they intend to use the model for. (For example, if I’m working with images of people in a crowd, that would be very different from images of people in a news interview.) The developer then has to tweak a variety of parameters from the original method to work on the nuances of their data until they get good accuracy.
  3. release the model into production (into the wild), and track performance so they can incorporate any learned feedback to improve the model further.

In order to solve these, the three missing pieces in today’s workflows are:

Tracking & Reproducibility: Easy replication of configurations and environments to reproduce or build on previous work

Collaboration: Automatic tracking of configurations, environments, and methodology and easy accessibility to these metrics from other’s

Deployment: Simple deployment to handle production data and real-time feedback to monitor performance and incorporate new observations

The not so distant future holds an open network where collaborators use the best practices above to aggregate the best models, approaches, and methodologies allowing QoDs to learn from and build on top of their work. In today’s networked paradigm it is still the case that we “see further” by standing on the shoulders of giants, however, an observer might add that our weight is also borne by millions of normal people.

If you’re interested in being a part of this network early on, let us know. We’re building Datmo.com, and would love to have you join.

Happy Building 👍

— Anand