Simon Crosby on distributed actors and data infrastructure that enable companies to analyze, learn and predict on the fly.
Simon Crosby is CTO of Swim.ai, a startup building tools (based on the Swim open source project) for next-generation data and AI applications. Swim is one of several projects (along with Ray and Akka) contributing to interest in the Actor Model for building large-scale machine learning and data applications and infrastructure. Simon describes the applications made possible by technologies like Swim, and notes that these technologies require considerably less hardware and DevOps resources. If you’re interested in building AI applications you need to listen to this episode.
There are three kinds of data: first is data in a SQL database, then there is big data, and finally we have the case where every single process, every single product, every bit of infrastructure is instrumented. … The next generation of IT systems have to stay in sync with the real world. And when I say in sync, I mean, you need to be able to drive a robot, or when you look at your Uber, you really want to know the car that you’re tracking is in front of you, and not minutes away.
… For applications that need to be in sync with the world, you have to analyze, learn and predict on the fly. You can’t afford to store data first, and then analyze.
… Let me give an example from a large telecommunications company. I mentioned before that they accrue around four petabytes of data a day. … They were using 400 servers worth of big data per day, and with Swim and Actors they’re down to 40! Not only that, it used to take them 10 hours to produce insights, that’s down to 10 milliseconds.
… The other cool thing here is that we can automatically build digital twins, which are needed in many AI applications. Every single additional twin is just an actor somewhere in memory. And in Swim every actor is also on the web. It’s API so it’s vertically integrated.
Highlights in the video version:
- Introduction to Simon Crosby
What Swim’s technology makes possible, and the case for continuous intelligence
Difference in tools for streaming data in 2017 versus today
The volume is high for events that are in right now or in recent past
Why doesn’t the normal scale out approaches work for continuous inlligence apps
Scale out, combine RAM in cluster, and data driven compute
Messaging Systems and SQL Interface
Predictive Models and Machine Learning on real time events
The actor model and developer tools
Examples of Actor Models and what it made possible
Digital Twin and ideal scenario for this technology
Industrial AI, IoT, and APM
Would events encompass more complex data like computer vision?
Recommendation for those interested in this next generation data infrastructure
For data engineers, how does this fit into the rest of their infrastructure?
Programming languages supported by Swim
Machine learning models for real-time predictions
Go to traffic.swim.ai to see ta real-time demo of traffic in Palo Alto, California
- A video version of this conversation is available on our YouTube channel.
- “Top Places to Work for Data Engineers”
- Michel Tricot: “Modernizing Data Integration”
- Chris White in conversation with Jenn Webb and Ben Lorica: “Towards a next-generation dataflow orchestration and automation system”
- Zhe Zhang: “How Technology Companies Are Using Ray”
[Image courtesy of Simon Crosby.]