The Data Exchange Podcast: Brad King on trends and innovations in hardware, storage, and data infrastructure technologies.
Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.
This week’s guest1 is Brad King, CTO of Scality, a company that builds software-defined file and object storage systems for hybrid & multi-cloud settings. Storage and compute are the basic building blocks of (cloud) computing platforms and this episode highlights all the important considerations and recent innovations in storage technologies that data engineers, architects, and machine learning professionals need to know. Data is being created at unprecedented rates. The advent of devices and sensors have led to an explosion in machine generated data. Alongside this growth in data assets within companies is the growing availability of tools for big data analytics and machine learning. Companies who want to stay ahead of their competitors are racing to unlock, analyze, and use their data assets as efficiently and as quickly as possible. Companies like Scality are working to unlock data assets to the expanding pool of data consumers across organizations.
We discussed key aspects of storage systems in the context of recent trends including:
- The shift towards multi-cloud
- The growing importance of machine learning and AI.
- The emergence of multi-modal AI models, and the need for storage systems that can accommodate many different types of data (structured data, alongside text, vide0, images, audio, and other unstructured data typpes).
- The excitement around all-flash object storage systems.
- The importance of metadata and metadata management systems.
- Dashboards and automation tools for IT and DataOps.
We created this dashboard that we really, really like to talk about with people where you can have multiple places where your data is, and they can be replicating between them, but you have a single view. For some companies that could be literally tens of petabytes of storage, some of it is on premises, some of it is in one public cloud. This dashboard lets them verify that data is being replicated successfully, that data is being protected well, and we can also open up access to it or do searches across it and treat multiple destinations for my data all in a single single space. Additionally, we’re moving more and more in the direction of providing classes of storage so that you have an object store that’s fast, and even maybe on flash, which is a trend in the industry, because flash is becoming more and more economical. Maybe you don’t want all your data on flash, but you can have it on different tiers and stores and lifecycle it and all those kinds of things. So I think that’s what we’re trying to provide are interfaces for managing your data.
Download a complete transcript of this episode by filling out the form below:
- A video version of this conversation is available on our YouTube channel.
- “One Simple Chart: most companies use multiple cloud providers”
- “The Emergence of Multi-cloud Native Applications and Platforms”
- “What is DataOps?”
- Davit Buniatyan: “Building a data store for unstructured data and deep learning applications”
- Travis Addair: “The Future of Machine Learning Lies in Better Abstractions”
- Ryan Wisnesky: “The Mathematics of Data Integration and Data Quality”
- Steve Touw: “Injecting Software Engineering Practices and Rigor into Data Governance”
2021 Data Engineering Survey
The 2021 Data Engineering Survey is now open and we need your help. The survey takes about 5 minutes to fill out and we’ll share the report of the survey findings with you. You’ll also be entered in a drawing for free copies of the Data Teams book and other prizes.
 This post and episode are part of a collaboration between Gradient Flow and Superb AI. See our statement of editorial independence.
[Image: bamboo trees from pxfuel]