David Yakobovitch & Gideon Mendels , HumAIn Podcast

Why Machine Learning is Now Part of the Software Engineer's Toolkit with Gideon Mendels

29 Apr 2020 • 46 min • EN

[Audio] Podcast: Play in new window | Download Subscribe: Google Podcasts | Spotify | Stitcher | TuneIn | RSS Gideon Mendels is co-founder and the CEO of CometML. Gideon is an experienced data scientist and entrepreneur. He worked on Deep Learning research at Google and Columbia University and previously co-founded Groupwize. Episode Links: Gideon Mendels’ LinkedIn: https://www.linkedin.com/in/gideon-mendels/ Gideon Mendels’ Twitter: @comet_ai Gideon Mendels’ Website: https://www.comet.ml/site/ Podcast Details: Podcast website: https://www.humainpodcast.com/ Apple Podcasts: https://podcasts.apple.com/us/podcast/humain-podcast-artificial-intelligence-data-science/id1452117009 Spotify: https://open.spotify.com/show/6tXysq5TzHXvttWtJhmRpS RSS: https://feeds.redcircle.com/99113f24-2bd1-4332-8cd0-32e0556c8bc9 YouTube Full Episodes: https://www.youtube.com/channel/UCxvclFvpPvFM9_RxcNg1rag YouTube Clips: https://www.youtube.com/channel/UCxvclFvpPvFM9_RxcNg1rag/videos Support and Social Media: – Check out the sponsors above, it’s the best way to support this podcast – Support on Patreon: https://www.patreon.com/humain/creators – Twitter: https://twitter.com/dyakobovitch – Instagram: https://www.instagram.com/humainpodcast/ – LinkedIn: https://www.linkedin.com/in/davidyakobovitch/ – Facebook: https://www.facebook.com/HumainPodcast/ – HumAIn Website Articles: https://www.humainpodcast.com/blog/ Outline: Here’s the timestamps for the episode: (00:00) – Introduction (02:06) – Some people call Israel the start-up nation. New York's the new Mecca and it's the Mecca of technology. (02:46) – To build a better model, especially if you're inheriting an existing one you try to figure out what people did already. You don't want to reinvent the wheel. You want to see what works, what doesn't. Where's the exact data set. (04:27) – We eventually collected most of the information, but we started from scratch because we wanted to make sure we're not basing our assumptions in something that might have been inaccurate. (04:54) – We found another approach that was much simpler than what we had in production. When you don't have the right processes and tools, it's really hard to bring ROI on these efforts. (05:29) – We have this amazing stack of tools, anything from testing, monitoring, orchestration, CI/CD, Versioning, you name it. And there's a lot of, sometimes, maybe too many, but then you go to machine learning teams and both of them are still using a combination of scripts, notebooks, and emails. And that's a fallback. There's definitely a better way to do this. It is exciting that developer tools and machine learning are helping these bigger companies to build reliable missionary models. (06:16) – Comet is a meta machine learning platform designed to help these machine learning or AI practitioners and their teams to build machinery models for real world application. The platform allows these teams to automatically track and manage their dataset, their code, experiments, models, as we solve problems around reproducibility, visibility, efficiency, and loss of institutional knowledge. (07:31) – Some engineers think Machine Learning is basically software engineering. But in machine learning, code is just one small piece of the puzzle. You have data, you have experimentation, you have results, you have models and models in production. But at the end of the day, these are different processes. And for that, we need different tools and different methodology. (08:57) – Our approach has always been to be agnostic to what tool to use. We work with any type of machine in the library, whether it's the common ones, Perch, TensorFlow, scikit-learn, but even if you have something that's completely custom that you built in your garage or in your organization, you can still use Comet. (09:32) – Pick the best tool for the job but still have one platform where you can see everything, you can compare your results, you can share them, you can collaborate. So very similar from that perspective to what GitHub did for code, we're doing for machine learning. (11:36) – Python has definitely been the most dominant language on the machine learning side of things. We still see quite a lot of R users. Mostly those with a more traditional statistics background, but we also see people training models in things like Java. (12:31) – You can see the emergence of low-code or no-code solutions. Those will become more and more popular as we go, as well. (13:46) – Deepfakes, like with every new technology, are an amazing technology that is used and can be used for really great things. There's no question that people can abuse it. There are some similarities to hate speech in the sense that we will need to use machine learning to detect them. But we would need to make sure we set some kind of policy. (16:16) – We have major enterprise customers, multiple Fortune 100's across industry. We have some big tech companies, finance, automotives, media companies, biotech, retail, even manufacturing. We do have dedicated models and the platform to look at, computer vision problems, looking at your model predictions and debugging them same from natural language processing, tabulary data and audio. But we're not limited to a certain use case. (18:37) – We recently announced a partnership with Uber AI Labs which developed a really unique product or library called Ludwig. Ludwig is a no-code machine learning library. You kind of define the specification of the models without coding anything. And then you can train your model based on that. And Comet is the built-in experimentation management tool for that. (21:31) – For ancestry, one of the key things is they have Comet as the central place for their team to track their machinery and experiments and debug them. One of the biggest challenges in machine learning is debugging these models. It's about figuring out where your model predicts the wrong results. (22:39) – One of the biggest value propositions in Comet is that they can look into the results of the model and track predictions over time and better understand what's going on and how to drive the research process forward. You look at the results. You decide that this model is not doing any good. Just you click a small stop button. It's very simple, but it's very valuable if you're trying to move quickly. (24:14) – Transfer learning falls under the subfield of meta machine learning; using machine learning to improve machine learning.The idea with transfer learning is by using a model or a training and a much bigger data set, you can get much better results than with your smaller one. This has two advantages: the ability to get a better result in your data set, but also saving a lot of costs. (29:11) – The predictor is an early stopping mechanism. We try to predict where your model is going, and then once things look like they're not going anywhere, or the model has converged or that the line essentially flattens in a way, you stop the model. You reiterate and try to figure out the next step. And that's essentially the research process. We can actually automate this process, and you get to move 30% faster. (32:48) – Product side, instead of trying to solve all the problems in this space, to build one end to end solution that does everything. If you have one platform that replaces AWS, New Relic, GitHub, Jenkins, all the tools in the world, one product with one login, that's something that's very hard to do. (35:07) – Machine learning is essentially becoming another tool for engineering. Things will definitely converge. And if you look into undergraduate programs, for example, machine learning and AI have become part of the core curriculum. (38:09) – If you're trying to classify some examples you can go back and do the data labeling process and get more data from this class and drive the research process for it. In production time, you won't be surprised anymore because you already looked into all these edge cases and solved them in training. These are the two main approaches people in the industry are taking. (43:16) – It's very important not to get married with a single library, use the best research that's out there. (44:11) – Overlap and collaboration between academia and industry. That convergence of being able to support both ways is very exciting. More companies are being able to get real business value from machine learning and AI. Advertising Inquiries: https://redcircle.com/brands Privacy & Opt-Out: https://redcircle.com/privacy

From "HumAIn Podcast"