Taking messaging and data ingestion systems to the next level

The Data Exchange Podcast: Sijie Guo on how Apache Pulsar is able to handle both queuing and streaming, and both online and offline applications.

SubscribeiTunesAndroidSpotifyStitcherGoogle, and RSS.

In this episode of the Data Exchange I speak with Sijie Guo, founder of StreamNative, a new startup focused on making enterprise messaging technologies – specifically Apache Pulsar – easy to use on the cloud. Sijie was previously a cofounder of Streamlio (acquired by Splunk) and prior to that he led the messaging team at Twitter. He is also the main organizer behind the Pulsar Summit (April in San Francisco), a new conference whose Call for Speakers closes on January 31st.

Join Sijie Guo, Matteo Marli, Karthik Ramasamy, and many other speakers at the Pulsar Summit, a FREE virtual conference (June 17-18).

I’ve written about the importance of foundational data technologies, and data ingestion and messaging are the starting point for modern data applications. As data and machine learning continue to grow in importance, it’s critical for companies to make sure they have the right messaging systems in place.

Our conversation spanned many topics, including:

  • The role of messaging in modern data applications and platforms.
  • The two main types of messaging applications: queuing and streaming.
  • Apache Pulsar as a unified messaging platform, able to handle both queuing and streaming, and both online and offline applications.
  • A status update on Apache Pulsar.
  • (Full transcript of our conversation is below.)

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

Download a complete transcript of this episode by filling out the form below:

Short excerpt:
Ben: ​For someone who’s not technical and who just wants to have an idea of what these systems can do, at a high level, what are the two types of messaging—what is queuing and what is streaming? And what are some good examples for each?

Sijie: ​In terms of communication, it typically divides into two patterns. One is queuing. A simple way to think about this is the queue. For example, when you go into a bank, you wait in a queue for a banker to help you. That is like a worker queue, with task-oriented workloads that are processed on a per-event basis. Since they’re processing all events, they don’t really care about ordering. This messaging queuing system is common in industries like e-commerce retailers to process payments, transactions, and billing statements. That is one of the most common communication channels, taking the messages, all the events, from the end user to your system.

Ben: ​So, what about streaming?

Sijie: ​In terms of streaming, you get a sequence of events that you want to collect based on a per-entity, or per-stream basis. For example, for an IoT device that you want to use to collect one data point, like temperature, you’d have a sensor collecting all the temperature changes. You’d want to collect those data points, or events, in a particular order so you can virtualize or analyze the changes in sequence.

Streaming systems are focused on things like user behavior, fraud detection, and maybe event processing and log collection. For situations where you want to collect the events from a certain device or certain entity in a particular order, and you want to analyze the behavior from the sequence of events.

Related content:

[Image from pxfuel.]