Overview: Collaboration in Quantitative Workflows
May 17, 2017
Original Link: https://medium.com/datmo/the-future-of-collaboration-in-data-projects-24d439f197a9
The other day two of our engineers, we’ll call them Alex and Ignacio, were trying to improve the accuracy of a facial recognition model. The main goal: get the model to recognize Donald Trump in images, videos, and gifs with as high confidence as possible. The sub-goal: Alex wanted Ignacio’s help in improving the model for practical use cases, specifically when being used on a set of 10,000 new labelled images they received from a customer. Since the new images are from “the wild” many times they include multiple people, and a variety of new elements in the frame. He wants Ignacio’s help in determining how to make sure the model still works for these images. The first step: Alex needed to share the most recent iteration of the model with Ignacio, and this is where the collaboration process started to break.
What does sharing a model really mean?
There are a few components necessary to make the model Alex created available to my colleagues, people around the world, or himself (on a different server or computer for example).
- Code — How can Alex share the raw code he has created with Ignacio?
- Data — Where’s the data Alex had used originally, how can Ignacio add the new data set to retrain the model? Can they access the data through different locations or do they have to be on the same server at the same time? Is there concurrent access of the data?
- Model Snapshots — How can Alex share the snapshots of different parts of his experiments with Ignacio?
- Environmental Dependencies — How can Alex ensure Ignacio can run these experiments in the same way he did?
- Methodology — How does Ignacio reproduce the steps Alex’s took in his work? Has Alex kept track of the experiments that he has run and the snapshots of the models he has created? Has he kept a log of why he ran the experiments he did?
- Compute Resources — What computational resources will Ignacio need to run Alex’s latest model? Will Alex need to share the resources with him?
Details on each of the components and what they look like for Alex.
When Ignacio finally does set up and complete work Alex’s model, how does he share it back? How do they test, deploy, and get feedback incorporated to improve the model once it is in production? Closing these loops are really important to successfully collaborate on a machine learning project. Unfortunately the process today is extremely broken.
Here are Alex’s options today
Common ways to collaborate and share each of these components today.
Thankfully there are a set of best practices that if implemented result in the best possible results with current tools.
Best Practices for Collaboration in AI
We collated these from our own experiences and a series of interviews we’ve conducted with people and teams working on machine learning from a number of organizations including Google, Facebook, Uber, Intuit, Stanford AI Lab.
Best practices to collaborate on each one of these components.
The story of Alex and Ignacio is really the story of why we started Datmo. Starting with our Command Line Interface (CLI) and our GUI, our mission is to become the collaborative platform that helps democratize and drive forward artificial intelligence into real-world applications. For Alex and Ignacio, this is what their process looks like in Datmo.
The Datmo way to collaborate. Keep it simple!
With our tools we are solving the problem of tracking and sharing your work (you can check out our blog post on the topic here) as well as collaboration between team members. We do this with a Command Line Interface that tracks model training sessions and keeps snapshots of the models associated with different parameters and configurations you might run. We also greatly simplify the environment setup, compute handling and deployment by leveraging Docker. Finally, we make plugging new data and new models together super easy with the concept of data contracts which define interaction between models and data (we’ll be posting a blog about that in the near future). Our easy-to-use GUI allows developers building these algorithms to instantly share their models with collaborators and colleagues and visualize their work directly in the web browser. No more fiddling around with shared servers, cloud storage, or manual bookkeeping.
To take advantage of our free tools, sign up for our beta at https://datmo.io. Also, we would love for you to get in touch with us if you have any questions and/or thoughts on what we’ve described above via email at anand@datmo.io or on social media.
Happy Collaborating 🙂
Signup to our newsletter at https://datmo.io/ to get updates on our machine learning suite.
P.S. Thanks for reading this far! If you found value in this, We’d really appreciate it if you recommend this post (by clicking the ❤ button) so other people can see it!.
Concept Dependencies