You haven't seen the worst of it. We had to implement a whole kafka module for a SCADA system because Target already had unrelated kafka infrastructure. Instead of REST API or anything else sane (which was available), ultra low volume messaging is now done by JSON objects wrapped in kafka. Peak incompetence.
We did something similar using RabbitMQ with bson over AMQP, and static message routing. Anecdotally, the design has been very reliable for over 6 years with very little maintenance on that part of the system, handles high-latency connection outage reconciliation, and new instances are cycled into service all the time.
Mostly people that ruminate on naive choices like REST/HTTP2/MQTT will have zero clue how the problems of multiple distributed telemetry sources scale. These kids are generally at another firm by the time their designs hit the service capacity of a few hundred concurrent streams per node, and their fragile reverse-proxy load-balancer CISCO rhetoric starts to catch fire.
Note, I've seen AMQP nodes hit well over 14000 concurrent users per IP without issue, as RabbitMQ/OTP acts like a traffic shock-absorber at the cost of latency. Some engineers get pissy when they can't hammer these systems back into the monad laden state-machines they were trained on, but those people tend to get fired eventually.
Note SCADA systems were mostly designed by engineers, and are about as robust as a vehicular bridge built by a JavaScript programmer.
Anecdotally, I think of Java as being a deprecated student language (one reason to avoid Kafka in new stacks), but it is still a solid choice in many use-cases. Sounds like you might be too smart to work with any team. =3
> Anecdotally, I think of Java as being a deprecated student language (one reason to avoid Kafka in new stacks), but it is still a solid choice in many use-cases. Sounds like you might be too smart to work with any team. =3
Honestly from reading this it seems like you’re the one who is too smart to work with any team.
I like working with folks that know a good pint, and value workmanship.
If you are inferring someone writing software for several decades might share, than one might want to at least reconsider civility over ones ego. Best of luck =3
Many NDA do not really ever expire on some projects, most work is super boring, and recovering dysfunctional architectures with a well known piece of free community software is hardly grandstanding.
"It works! so don't worry about spending a day or two exploring..." should be the takeaway insight about Erlang/RabbitMQ. Have a wonderful day. =3
Let’s be real: teams come to the infra team asking for a queue system. They give their requirements, and you—like a responsible engineer—suggest a more capable queue to handle their needs more efficiently. But no, they want Kafka. Kafka, Kafka, Kafka. Fine. You (meaning an entire team) set up Kafka clusters across three environments, define SLIs, enforce SLOs, make sure everything is production-grade.
Then you look at the actual traffic: 300kb/s in production. And right next to it? A RabbitMQ instance happily chugging along at 200kb/s.
You sit there, questioning every decision that led you to this moment. But infra isn’t the decision-maker. Sometimes, adding unnecessary complexity just makes everyone happier. And no, it’s not just resume-padding… probably.
That’s almost certainly true, but at least part of the problem (not just Kafka but RDD tech in general) is that project home pages, comments like this and “Learn X in 24 hours” books/courses rarely spell out how to clearly determine if you have an appropriate use case at an appropriate scale. “Use this because all the cool kids are using it” affects non-tech managers and investors just as much as developers with no architectural nous, and everyone with a SQL connection and an API can believe they have “big data” if they don’t have a clear definition of what big data actually is.
Redpanda is much more lean and scales much better for low latency use cases. It does a bunch of kernel bypass and zero copy mechanisms to deliver low latency. Being in C++ means it can fit into much smaller footprints than Apache Kafka for a similar workload
Those are all good points and pros for redpanda vs Kafka but my question stills stands. Isn't redpanda designed for high-volume scale similar to the use cases for Kafka rather than the low volume workloads talked about in the article?
In kafka, if you require the highest durability for messages, you configure multiple nodes on different hosts, and probably data centres, and you require acks=all. I'd say this is the thing that pushes latency up, rather than the code execution of kafka itself.
How does redpanda compare under those constraints?
I needed to synchronize some tables between MS SQL Server and PostgreSQL. In the future we will need to add ClickHouse database to the mix. When I last looked, the recommended way to do this was to use Debezium w/Kafka. So that is why we use it. Data volume is low.
If anybody knows of a simpler way to accomplish this, please do let me know.
Or, as mentioned in the article, you've already got Kafka in place handling a lot of other things but need a small queue as well and were hoping to avoid adding a new technology stack into the mix.
The kafka protocol is a distributed write ahead log. If you want a job queue you need to build something on top of that, it’s a pretty low level primative.
The way I describe Kafka is, "an event has transpired... sometimes you care, and choose to take an action based on that event"
The way I describe RabbitMQ is, "there's a new ticket in the lineup... it needs to be grabbed for action or left in the lineup... or discarded"
Definitely not perfect analogies. But they get the point across that Kafka is designed to be reactive and message queues/job queues are meant to be more imperative.
Until you hit scale, the database you're already using is fine. If that's Postgres, look up SELECT FOR UPDATE SKIP LOCKED. The major convenience here - aside from operational simplicity - is transactional task enqueueing.
For hosted, SQS or Google Cloud Tasks. Google's approach is push-based (as opposed to pull-based) and is far and above easier to use than any other queueing system.
Cloud Tasks is one of the most undervalued tools in the GCP ecosystem, but mostly because PubSub gets all the attention. I've been using it since it was baked in the AppEngine and love it for 1-to-1 queues or delayed job handling.
Back when I was all-in on GCP, I had a queue.yaml file which the appengine deployer syncs to cloud tasks (creates/disabled queues, changes the rate limits, concurrency, etc).
Now that I'm mostly on AWS... I still use the same system. I have a thin little project that deploys to GAE and has a queue.yaml file. It sets up the cloud tasks queues. They hit my EB endpoints just like they used to hit my GAE endpoints.
As a bonus, my thin little GAE app also has a cron.yaml that it proxies to my AWS app. Appengine's cron is also better than Amazon's overcomplicated eventbridge system.
Terraform is definitely for the best. Any AI tool should be able to spit it out well enough, but if you do rawdog it in the console or gcloud you might be able to export the terraform with:
How could I solve the problem of in-order processing based on a key using skip locked? Basically all records having the key to be processed one after other.
Work jobs in the order they were submitted within a partition key. This selects the next partition key that isn't locked. You could make it smarter to select a subset of the jobs checking for partition keys where all of the rows are still unlocked.
SELECT
*
FROM jobs
WHERE partition_key = (
SELECT partition_key
FROM jobs
ORDER BY partition_key
LIMIT 1
SKIP LOCKED
)
ORDER BY submitted_at
FOR UPDATE SKIP LOCKED;
I'm probably biased, but in the number of cases where I had to work with Kafka, I'd really prefer to simply have an SQL database. In all of those cases I struggled to understand why developers wanted Kafka, what problem was it solving better than the database they already had, and for the life of me, there just wasn't one.
I'm not saying that configuring and deploying databases is easy, but it's probably going to happen anyway. Deploying and configuring Kafka is a huge headache: bad documentation, no testing tools, no way to really understand performance in the light of durability guarantees (which are also obscured by the poor quality documentation). It's just an honestly bad product (from the infra perspective): poor UX, poor design... and worst of all, it's kind of useless from the developer standpoint. Not 100% useless, but whatever it offers can be replaced by other existing tools with a tiny bit of work.
These weren't his last words, but Jim Gray had this to say
about this so-called "antipattern".
Queues Are Databases (1995)
Message-oriented-middleware (MOM) has become an small industry.
MOM offers queued transaction processing as an advance over pure
client-server transaction processing. This note makes four points:
Queued transaction processing is less general than direct transaction
processing. Queued systems are built on top of direct systems.
You cannot build a direct system atop a queued system.
It is difficult to build direct, conversational, or distributed
transactions atop a queued system. Queues are interesting databases
with interesting concurrency control. It is best to build these
mechanisms into a standard database system so other applications
can use these interesting features. Queue systems need DBMS functionality.
Queues need security, configuration, performance monitoring, recovery,
and reorganization utilities. Database systems already have these features.
A full-function MOM system duplicates these database features.
Queue managers are simple TP-monitors managing server pools driven by queues.
Database systems are encompassing many server pool features
as they evolve to TP-lite systems.
as with everything, it depends on how you're processing the queue.
eg we built a system at my last company to process 150 million objects / hour, and we modeled this using a postgres-backed queue with multiple processes pulling from the queue.
we observed that, whenever there were a lot of locked rows (ie lots of work being done), Postgres would correctly SKIP these rows, but having to iterate over and skip that many locked rows did have a noticeable impact on CPU utilization.
we worked around this by partitioning the queue, indexing on partition, and assigning each worker process a partition to pull from upon startup. this reduced the # of locked rows that postgres would have to skip over because our queries would contain a `WHERE partition=X` clause.
i had some great graphs on how long `SELECT FOR UPDATE ... SKIP LOCKED` takes as the number of locked rows in the queue increases, and how this partiton work around reduced the time to execute the SKIP LOCKED query, but unfortunately they are in the hands of my previous employer :(
I did sth similar. Designed and built for 10 million objects / hour. Picked up by workers in batches of 1k. Benchmark peaked above 200 million objects / hour with PG in a small VM. Fast forward two years, the curse of success strikes, and we have a much higher load than designed for.
Redesigned to create batches on the fly and then `SELECT FOR UPDATE batch SKIP LOCKED LIMIT 1` instead of `SELECT FOR UPDATE object SKIP LOCKED LIMIT 1000`. And just like that, 1000x reduction in load. Postgres is awesome.
----
The application is for processing updates to objects. Using a dedicated task queue for this is guaranteed to be worse. The objects are picked straight from their tables, based on the values of a few columns. Using a task queue would require reading these tables anyway, but then writing them out to the queue, and then invalidating / dropping the queue should any of the objects' properties update. FOR UPDATE SKIP LOCKED allows simply reading from the table ... and that's it.
smart. although, i guess that pushes the locking from selecting queue entries to making sure that objects are placed into exactly 1 batch. curious if you ran into any bottlenecks there?
40,000 per second is waaaaay beyond where you should use a dedicated queuing solution. Even dedicated queues require tuning to handle that kind of throughput.
(or you can just use SQS or google cloud tasks, which work out of the box)
I hit 60k per second in 2020 on a 2-core, 100GB SSD installation of PG on GCP. And "tuning" PG is way easier than any dedicated queueing system I've seen. Does there exist a dedicated queueing system with an equivalent to EXPLAIN (ANALYZE)?
It's possible the person you're replying to wasn't using replication, so it's entirely different. Those folks also used "synchronous_commit is set to remote_write" which will have a performance impact
I worked at a shop that had to process about 6M RPS for 5 seconds at a time, once a minute or so. That looked a lot like a boatload of Python background threads queueing work in memory then flushing them out into Cassandra. That was a fun little project.
The main complaint seems to be that it's not optimal...but then, the frame of the discussion was "Until you hit scale", so IMHO convenience and simpler infra trumps having the absolute most efficient tool at that stage.
SQS, Azure Service Bus, RabbitMQ, ActiveMQ, QPID, etc… any message broker that provides the competing consumer pattern. though I’ll say having managed many of these message brokers myself, it’s definitely better paying for a managed service. They’re a nightmare when you start running into problems.
If you're using .NET I have to plug
https://particular.net/
Nservicebus from particular.net. It's great at abstracting away the underlying message broker and provides an opinionated way to build a distributed system.
.Net SRE here, please no. Take 5 minutes to learn your messaging bus SDK and messaging system instead of yoloing some library that you don't understand. It's really not that hard.
Also, ServiceControl, ServiceInsight and ServicePulse are inventions of developers who are clearly WinAdmins who don't know what modern DevOps is. If you want to use that, you are bad and should feel bad.
Actually, I used RabbitMQ static routes to feed per-cpu-core single thread bound consumers that restart their process every k transactions, or watchdog process timeout after w seconds. This prevents cross contamination of memory spaces, and slow fragmentation when the parsers get hammered hard.
RabbitMQ/Erlang on OTP is probably one of the most solid solutions I've deployed over the years (low service cycle demands.) Highly recommended with the AMQP SSL credential certs, and GUID approach to application layer load-balancing. Cut our operational costs around 37 times lower than traditional load-balancer approaches. =3
Agree. RabbitMQ is a Swiss Army knife that has a lot of available patterns, scales fairly well, and is very robust. If you don’t know what to choose, start with Rabbit. It will let you figure out which patterns you need and you probably won’t scale out of it. Pay someone to host it.
On the other hand, if you know what you need to do and it’s supported by it, NATS is IME the way to go (particularly JetStream).
Pulsar vs Kafka was a significant lesson to me: The "best" technology isn't always the winner.
I put it in quotes because I'm a massive fan of Pulsar and addressing the shortcomings of Kafka. However, with regards to some choices at a former workplace: The broader existing support/integration ecosystem along with Confluent's commercial capabilities won out with regards to technology choices and I was forced to acquiesce.
A bit like Betamax vs VHS, albeit that one pre-dates me significantly.
Even StreamNative is effectively abandoning Pulsar and going all-in on the Kafka protocol. I can see the theoretical benefits of Pulsar, but it just doesn’t seem to have the ecosystem momentum to compete with the Kafka juggernaut.
It sure looks like they’re going quite a ways beyond Kafka-on-Pulsar - the Ursa/Oxia work they’re focused on right now replaces BookKeeper and seems very firmly Kafka-oriented. Or does Ursa also work with the Pulsar protocol?
We use RabbitMQ, and workers simply pull whatever is next in the queue after they finish processing their previous jobs. I’ve never witnessed jobs piling up for a single consumer.
Kafka with a different partitioner would have worked fine. The problem was that the web workers loaded up the same partition. Randomising the chosen partition would have removed, or at least alleviated, the stated problem.
Has anyone used Redpanda? I stumbled upon it when researching streaming, it claims to be Kafka compatible but higher performance and easier to manage. Haven't tried it myself but interested if anyone else has experience.
Plenty of people choose Redpanda because it’s the easiest getting started experience for kafka. There is a single binary for the full broker, where I have never seen Apache Kafka as easy to setup. It’s got a great UI as well.
I'm not sure you understood the article. You can have a very low load but each task on your queue takes a while to process, in which case you want fair distribution of work.
The distribution is fair - everything is round-robin, so in the long run each worker receives the same rate of tasks. It's just "lumpy" - sometimes you can get a big batch sent to one worker, then a big batch sent to another worker - but it will all average out.
We build an Infrastructure with about 6 microservices and Kafka as main message queue (job queue).
The problem the author describes is 100% true and if you are scaled with enaugh workers this can turn out really bad.
While not beeing the only issue we faced (others are more environment/project-language specific) we got to a point where we decided to switch from kafka to rabbitmq.
First time I've heard of KIP-932 and it looks very good. The two biggest issues IMO are finding a good Kafka client in the language you need (even for ruby this is a challenge) and easy at-least-once workers.
You can over partition and make at-least-once workers happen (if you have a good Kafka client), or you use an http gateway and give up safe at-least-once. Hopefully this will make it easier to build an at-least-once style gateway that's easier to work with across a variety of languages. I know many have tried in the past but not dropping messages is hard to do right.
For a small load queueing system, I had great success with Apache ActiveMQ back in the days. I designed and implemented a system with the goal of triggering SMS for paid content. This was in 2012.
Ultimately, the system was fast enough that the telco company emailed us and asked to slow down our requests because their API was not keeping up.
In short: we had two Apache Camel based apps: one to look at the database for paid content schedule, and queue up the messages (phone number and content). Then, another for triggering the telco company API.
>
Each of these Web workers puts those 4 records onto 4 of the topic’s partitions in a round-robin fashion. And, because they do not coordinate this, they might choose the same 4 partitions, which happen to all land on a single consumer
Then choose a different partitioning strategy. Often key based partitioning can solve this issue. Worst case scenario, you use a custom partitioning strategy.
Additionally , why can’t you match the number of consumers in consumer group to number of partitions?
The KIP mentioned seems interesting though. Kafka folks trying to make a play towards replacing all of the distributed messaging systems out there. But does seem a bit complex on the consumer side, and probably a few foot guns here for newbies to Kafka. [1]
What that post describes (all work going to one/few workers) in practice doesn't really happen if you properly randomize (e.g. just use random UUID) ID of the item/task when inserting it into Kafka.
With that (and sharding based on that ID/value) - all your consumers/workers will get equal amount of messages/tasks.
Both post and seemingly general theme of comments here is trashing choice of Kafka for low volume.
Interestingly both are ignoring other valid reasons/requirements making Kafka perfectly good choice despite low volume - e.g.:
- multiple different consumers/workers consuming same messages at their own pace
- needing to rewind/replay messages
- guarantee that all messages related to specific user (think bank transactions in book example of CQRS) will be handled by one pod/consumer, and in consistent order
- needing to chain async processing
And I'm probably forgetting bunch of other use cases.
And yes, even with good sharding - if you have some tasks/work being small/quick while others being big/long can still lead to non-optimal situations where small/quick is waiting for bigger one to be done.
However - if you have other valid reasons to use Kafka, and it's
just this mix of small and big tasks that's making you hesitant... IMHO it's still worth trying Kafka.
Between using bigger buckets (so instead of 1 fetch more items/messages and handle work async/threads/etc), and Kafka automatically redistributing shards/partitions if some workers are slow ... You might be surprised it just works.
And sure - you might need to create more than one topic (e.g. light, medium, heavy) so your light work doesn't need to wait for heavier one.
Finally - I still didn't see anyone mention actual real deal breakers for Kafka.
From the top of my head I recall a big one is no guarantee of item/message being processed only once - even without you manually rewinding/reprocessing it.
It's possible/common to have situations where worker picks up a message from Kafka, processes (wrote/materialized/updated) it and when it's about to commit the kafka offset (effectively mark it as really done) it realizes Kafka already re-partitioned shards and now another pod owns particular partition.
So if you can't model items/messages or the rest of system in a way that can handle such things ... Say with versioning you might be able to just ignore/skip work if you know underlying materialized data/storage already incorporates it, or maybe whole thing is fine with INSERT ON DUPLICATE KEY UPDATE) - then Kafka is probably not the right solution.
You say:
> What that post describes (all work going to one/few workers) in practice doesn't really happen if you properly randomize (e.g. just use random UUID) ID of the item/task when inserting it into Kafka.
I would love to be wrong about this, but I don't _think_ this changes things. When you have few enough messages, you can still get unlucky and randomly choose the "wrong" partitions. To me, it's a fundamental probability thing - if you roll the dice enough times, it all evens out (high enough message volume), but this article is about what happens when you _don't_ roll the dice enough times.
Fair enough. I agree .25^20 is basically infinitesimal, and even with a smaller exponent (like .25^3) the odds are not great, so I appreciate you calling this out.
Flipping this around, though, if you have 4 workers total and 3 are busy with jobs (1 idle), your next job has only a 25% chance of hitting the idle worker. This is what I see the most in practice; there is a backlog, and not all workers are busy even though there is a backlog.
With Kafka you normally don't pick a worker - Kafka does that. IIRC with some sort of consistent hashing - but for simplicity sake lets say it's just modulo 'messageID % numberOfShards'.
You control/configure numberOfShards - and its usually set to something order of magnitude bigger than your expected number of workers (to be precise - that's number of docker pods or hardware boxes/servers) - e.g. 32, 64 or 128.
So in practice - Kafka assigns multiple shards to each of your "workers" (if you have more workers than shards then some workers don't do any work).
And while each of your workers is limited to one thread for consuming Kafka messages. Each worker can still process multiple messages at the same time - in different async/threads.
The other thing that's PITA with Kafka is fail/retry.
If you want to continue processing other/newer items/messages (and usually you do), you need to commit Kafka topic offset - leaving you to figure out what to do with failed item/message.
One simple thing is just re-inserting it again into the same topic (at the end). If it was temps transient error that could be enough
Instead of same topic, you can also insert it into another failedX Kafka topic (and have topic processed by cron like scheduled task).
And if you need things like progressive backing off before attempting reprocessing - you liekly want to push failed items into something else.
While it could be another tasks system/setup where you can specify how many reprocessing attempts to make, how much time to wait before next attempt ...etc. Often it's enough to have a simple DB/table.
Having never actually used this platform before, does anybody know why they named it Kafka, with all the horrible meanings?
Per Wiktionary, Kafkaesque: [1]
1. "Marked by a senseless, disorienting, often menacing complexity."
2. "Marked by surreal distortion and often a sense of looming danger."
3. "In the manner of something written by Franz Kafka." (like the software language was written by Franz Kafka)
Example: Metamorphosis Intro: "One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections. The bedding was hardly able to cover it and seemed ready to slide off any moment. His many legs, pitifully thin compared with the size of the rest of him, waved about helplessly as he looked." [2]
Kafka for small message volumes is one of those distinct resume-padding architectural vibes.
Apt time to mention the classic "Command-line Tools can be 235x Faster than your Hadoop Cluster", for those who may have not yet read it.
https://adamdrake.com/command-line-tools-can-be-235x-faster-...
You haven't seen the worst of it. We had to implement a whole kafka module for a SCADA system because Target already had unrelated kafka infrastructure. Instead of REST API or anything else sane (which was available), ultra low volume messaging is now done by JSON objects wrapped in kafka. Peak incompetence.
> for a SCADA system
for Ignition?
Probably Wonderware
We did something similar using RabbitMQ with bson over AMQP, and static message routing. Anecdotally, the design has been very reliable for over 6 years with very little maintenance on that part of the system, handles high-latency connection outage reconciliation, and new instances are cycled into service all the time.
Mostly people that ruminate on naive choices like REST/HTTP2/MQTT will have zero clue how the problems of multiple distributed telemetry sources scale. These kids are generally at another firm by the time their designs hit the service capacity of a few hundred concurrent streams per node, and their fragile reverse-proxy load-balancer CISCO rhetoric starts to catch fire.
Note, I've seen AMQP nodes hit well over 14000 concurrent users per IP without issue, as RabbitMQ/OTP acts like a traffic shock-absorber at the cost of latency. Some engineers get pissy when they can't hammer these systems back into the monad laden state-machines they were trained on, but those people tend to get fired eventually.
Note SCADA systems were mostly designed by engineers, and are about as robust as a vehicular bridge built by a JavaScript programmer.
Anecdotally, I think of Java as being a deprecated student language (one reason to avoid Kafka in new stacks), but it is still a solid choice in many use-cases. Sounds like you might be too smart to work with any team. =3
> Anecdotally, I think of Java as being a deprecated student language (one reason to avoid Kafka in new stacks), but it is still a solid choice in many use-cases. Sounds like you might be too smart to work with any team. =3
Honestly from reading this it seems like you’re the one who is too smart to work with any team.
I don’t know why but I could wear you are German (and old)
I like working with folks that know a good pint, and value workmanship.
If you are inferring someone writing software for several decades might share, than one might want to at least reconsider civility over ones ego. Best of luck =3
Neither being German or old are bad values from my point of view. But you tried a bit hard to flex with your past experiences tbh...
Many NDA do not really ever expire on some projects, most work is super boring, and recovering dysfunctional architectures with a well known piece of free community software is hardly grandstanding.
"It works! so don't worry about spending a day or two exploring..." should be the takeaway insight about Erlang/RabbitMQ. Have a wonderful day. =3
Oh no!
Let’s be real: teams come to the infra team asking for a queue system. They give their requirements, and you—like a responsible engineer—suggest a more capable queue to handle their needs more efficiently. But no, they want Kafka. Kafka, Kafka, Kafka. Fine. You (meaning an entire team) set up Kafka clusters across three environments, define SLIs, enforce SLOs, make sure everything is production-grade.
Then you look at the actual traffic: 300kb/s in production. And right next to it? A RabbitMQ instance happily chugging along at 200kb/s.
You sit there, questioning every decision that led you to this moment. But infra isn’t the decision-maker. Sometimes, adding unnecessary complexity just makes everyone happier. And no, it’s not just resume-padding… probably.
We have way way way less than that in my team. But they don't support anything else.
Then all the guys who requested that stuff quit
Well duh! They got a kafkaesque promotion using their upgraded resume!
[dead]
That’s almost certainly true, but at least part of the problem (not just Kafka but RDD tech in general) is that project home pages, comments like this and “Learn X in 24 hours” books/courses rarely spell out how to clearly determine if you have an appropriate use case at an appropriate scale. “Use this because all the cool kids are using it” affects non-tech managers and investors just as much as developers with no architectural nous, and everyone with a SQL connection and an API can believe they have “big data” if they don’t have a clear definition of what big data actually is.
It really is a red flag dependency. Some orgs need it... Everyone else is just blowing out their development and infrastructure budgets.
I use Kafka for a low-message-volume use case because it lets my downstream consumers replay messages… but yeah in most cases, it’s over kill
That was also a use case for me. However at some point I replaced Kafka with Redpanda.
Isn't redpanda built for the same scale requirements as Kafka?
Redpanda is much more lean and scales much better for low latency use cases. It does a bunch of kernel bypass and zero copy mechanisms to deliver low latency. Being in C++ means it can fit into much smaller footprints than Apache Kafka for a similar workload
Those are all good points and pros for redpanda vs Kafka but my question stills stands. Isn't redpanda designed for high-volume scale similar to the use cases for Kafka rather than the low volume workloads talked about in the article?
When the founder started it was designed to be two things:
* easy to use * more efficient and lower latency than the big resources needed for Kafka
The efficiency really matters at scale and low latency yes but the simplicity of deployment and use is also a huge win.
In kafka, if you require the highest durability for messages, you configure multiple nodes on different hosts, and probably data centres, and you require acks=all. I'd say this is the thing that pushes latency up, rather than the code execution of kafka itself.
How does redpanda compare under those constraints?
I needed to synchronize some tables between MS SQL Server and PostgreSQL. In the future we will need to add ClickHouse database to the mix. When I last looked, the recommended way to do this was to use Debezium w/Kafka. So that is why we use it. Data volume is low.
If anybody knows of a simpler way to accomplish this, please do let me know.
We used a binlog reader library for Python, wrapped it in some 50 loc of rudimentary integration code and hosted it on some container somewhere.
Data volume was low though.
Or, as mentioned in the article, you've already got Kafka in place handling a lot of other things but need a small queue as well and were hoping to avoid adding a new technology stack into the mix.
Don't disagree on the resume-padding but only taking into account message volume and not the other features is also not the best way to look at it.
Have I used (not necessarily decided on) Kafka in every single company/project for the last 8-9 years? Yes.
Was it the optimal choice for all of those? No.
Was it downright wrong or just added for weird reasons? Also no, not even a single time - it's just kinda ubiquitous.
How are we defining small message volumes?
Resume-driven development. Common antipattern.
The kafka protocol is a distributed write ahead log. If you want a job queue you need to build something on top of that, it’s a pretty low level primative.
Not for long. An early access version of KIP-932 Queues for Kafka will be released in 4.0 in a few weeks.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A...
Why does everybody keep missing this point? I don’t know.
There's a wonderful Kafka Children's book that I always suggest every team I work with read: https://www.gentlydownthe.stream/
The way I describe Kafka is, "an event has transpired... sometimes you care, and choose to take an action based on that event"
The way I describe RabbitMQ is, "there's a new ticket in the lineup... it needs to be grabbed for action or left in the lineup... or discarded"
Definitely not perfect analogies. But they get the point across that Kafka is designed to be reactive and message queues/job queues are meant to be more imperative.
Your two-sentence description is excellent. That book, not so much.
I suppose that's fair.
What do people recommend?
Especially for low levels of load, that doesn't require that the dispatcher and consumer are written in the same language.
Until you hit scale, the database you're already using is fine. If that's Postgres, look up SELECT FOR UPDATE SKIP LOCKED. The major convenience here - aside from operational simplicity - is transactional task enqueueing.
For hosted, SQS or Google Cloud Tasks. Google's approach is push-based (as opposed to pull-based) and is far and above easier to use than any other queueing system.
Cloud Tasks is one of the most undervalued tools in the GCP ecosystem, but mostly because PubSub gets all the attention. I've been using it since it was baked in the AppEngine and love it for 1-to-1 queues or delayed job handling.
how do you recommend working with Cloud Tasks?
raw dogging gcloud? Terraform? or something more manageable?
I've been curious for one of my smaller projects, but I am worried about adopting more GCPisms.
Back when I was all-in on GCP, I had a queue.yaml file which the appengine deployer syncs to cloud tasks (creates/disabled queues, changes the rate limits, concurrency, etc).
Now that I'm mostly on AWS... I still use the same system. I have a thin little project that deploys to GAE and has a queue.yaml file. It sets up the cloud tasks queues. They hit my EB endpoints just like they used to hit my GAE endpoints.
As a bonus, my thin little GAE app also has a cron.yaml that it proxies to my AWS app. Appengine's cron is also better than Amazon's overcomplicated eventbridge system.
It's great.
Terraform is definitely for the best. Any AI tool should be able to spit it out well enough, but if you do rawdog it in the console or gcloud you might be able to export the terraform with:
How could I solve the problem of in-order processing based on a key using skip locked? Basically all records having the key to be processed one after other.
Work jobs in the order they were submitted within a partition key. This selects the next partition key that isn't locked. You could make it smarter to select a subset of the jobs checking for partition keys where all of the rows are still unlocked.
Yes, something along the lines could work. But I am not sure if the above query itself would work if rows are appended to the table in parallel.
Also if events for a partition gets processed quick would the last partition get an equal chance?
I'm probably biased, but in the number of cases where I had to work with Kafka, I'd really prefer to simply have an SQL database. In all of those cases I struggled to understand why developers wanted Kafka, what problem was it solving better than the database they already had, and for the life of me, there just wasn't one.
I'm not saying that configuring and deploying databases is easy, but it's probably going to happen anyway. Deploying and configuring Kafka is a huge headache: bad documentation, no testing tools, no way to really understand performance in the light of durability guarantees (which are also obscured by the poor quality documentation). It's just an honestly bad product (from the infra perspective): poor UX, poor design... and worst of all, it's kind of useless from the developer standpoint. Not 100% useless, but whatever it offers can be replaced by other existing tools with a tiny bit of work.
Famious last words. There are database as a queue antipattern warnings about this.
> Famious last words.
These weren't his last words, but Jim Gray had this to say about this so-called "antipattern".
Queues Are Databases (1995)
Message-oriented-middleware (MOM) has become an small industry. MOM offers queued transaction processing as an advance over pure client-server transaction processing. This note makes four points: Queued transaction processing is less general than direct transaction processing. Queued systems are built on top of direct systems. You cannot build a direct system atop a queued system. It is difficult to build direct, conversational, or distributed transactions atop a queued system. Queues are interesting databases with interesting concurrency control. It is best to build these mechanisms into a standard database system so other applications can use these interesting features. Queue systems need DBMS functionality. Queues need security, configuration, performance monitoring, recovery, and reorganization utilities. Database systems already have these features. A full-function MOM system duplicates these database features. Queue managers are simple TP-monitors managing server pools driven by queues. Database systems are encompassing many server pool features as they evolve to TP-lite systems.
https://arxiv.org/abs/cs/0701158
Why is that an anti-pattern? Databases have added `SKIP LOCKED` and `SELECT FOR UPDATE` to handle these use cases. What are the downsides?
as with everything, it depends on how you're processing the queue.
eg we built a system at my last company to process 150 million objects / hour, and we modeled this using a postgres-backed queue with multiple processes pulling from the queue.
we observed that, whenever there were a lot of locked rows (ie lots of work being done), Postgres would correctly SKIP these rows, but having to iterate over and skip that many locked rows did have a noticeable impact on CPU utilization.
we worked around this by partitioning the queue, indexing on partition, and assigning each worker process a partition to pull from upon startup. this reduced the # of locked rows that postgres would have to skip over because our queries would contain a `WHERE partition=X` clause.
i had some great graphs on how long `SELECT FOR UPDATE ... SKIP LOCKED` takes as the number of locked rows in the queue increases, and how this partiton work around reduced the time to execute the SKIP LOCKED query, but unfortunately they are in the hands of my previous employer :(
How did you get from original post of "low level of load" to overengineering for "150 million objects/hr".
Is the concept of having different solutions for different scales not familiar to you?
I did sth similar. Designed and built for 10 million objects / hour. Picked up by workers in batches of 1k. Benchmark peaked above 200 million objects / hour with PG in a small VM. Fast forward two years, the curse of success strikes, and we have a much higher load than designed for.
Redesigned to create batches on the fly and then `SELECT FOR UPDATE batch SKIP LOCKED LIMIT 1` instead of `SELECT FOR UPDATE object SKIP LOCKED LIMIT 1000`. And just like that, 1000x reduction in load. Postgres is awesome.
----
The application is for processing updates to objects. Using a dedicated task queue for this is guaranteed to be worse. The objects are picked straight from their tables, based on the values of a few columns. Using a task queue would require reading these tables anyway, but then writing them out to the queue, and then invalidating / dropping the queue should any of the objects' properties update. FOR UPDATE SKIP LOCKED allows simply reading from the table ... and that's it.
smart. although, i guess that pushes the locking from selecting queue entries to making sure that objects are placed into exactly 1 batch. curious if you ran into any bottlenecks there?
I believe the article and parent comment were discussing queue solutions for low-volume situations.
completely missed this. apologies.
40,000 per second is waaaaay beyond where you should use a dedicated queuing solution. Even dedicated queues require tuning to handle that kind of throughput.
(or you can just use SQS or google cloud tasks, which work out of the box)
I hit 60k per second in 2020 on a 2-core, 100GB SSD installation of PG on GCP. And "tuning" PG is way easier than any dedicated queueing system I've seen. Does there exist a dedicated queueing system with an equivalent to EXPLAIN (ANALYZE)?
If that's true, you managed to do much better than these folks:
https://softwaremill.com/mqperf/
Maybe you should write a letter?
It's possible the person you're replying to wasn't using replication, so it's entirely different. Those folks also used "synchronous_commit is set to remote_write" which will have a performance impact
I worked at a shop that had to process about 6M RPS for 5 seconds at a time, once a minute or so. That looked a lot like a boatload of Python background threads queueing work in memory then flushing them out into Cassandra. That was a fun little project.
> 150 million objects / hour
Is not a low volume unless this could be done in batches of hundreds.
completely missed this. apologies.
I suppose you are referring to this:
https://mikehadlow.blogspot.com/2012/04/database-as-queue-an...
The main complaint seems to be that it's not optimal...but then, the frame of the discussion was "Until you hit scale", so IMHO convenience and simpler infra trumps having the absolute most efficient tool at that stage.
Can you elaborate? I guess it has to do with connection pooling?
SQS, Azure Service Bus, RabbitMQ, ActiveMQ, QPID, etc… any message broker that provides the competing consumer pattern. though I’ll say having managed many of these message brokers myself, it’s definitely better paying for a managed service. They’re a nightmare when you start running into problems.
If you're using .NET I have to plug https://particular.net/ Nservicebus from particular.net. It's great at abstracting away the underlying message broker and provides an opinionated way to build a distributed system.
.Net SRE here, please no. Take 5 minutes to learn your messaging bus SDK and messaging system instead of yoloing some library that you don't understand. It's really not that hard.
Also, ServiceControl, ServiceInsight and ServicePulse are inventions of developers who are clearly WinAdmins who don't know what modern DevOps is. If you want to use that, you are bad and should feel bad.
(Sorry, I have absolute rage around this topic)
EDIT: If you insist, use MassTransit (https://masstransit.io/)
As a linux fanboy recently trapped in a windows world, I actually find Particular stuff not so bad to work with.
It's on the friendlier end of the spectrum among the tooling I help manage, at least compared to Microsoft crap.
Either way I'm feeling quite validated by your rage, so thanks for sharing. I feel like we could be good friends.
NATS
https://docs.nats.io/nats-concepts/overview/compare-nats
I'd wish NATS were more popular. It feels it lacks some real big sponsors $$$.
Scaling with NATS seems weird. I like what i’ve seen with others using it though
NATS/WebSockets are good for 1 publisher -> many consumer (pubsub)
RabbitMQ is good for 1 producer -> 1 consumer with ack/nack
Right?
NATS does many-to-many.
Actually, I used RabbitMQ static routes to feed per-cpu-core single thread bound consumers that restart their process every k transactions, or watchdog process timeout after w seconds. This prevents cross contamination of memory spaces, and slow fragmentation when the parsers get hammered hard.
RabbitMQ/Erlang on OTP is probably one of the most solid solutions I've deployed over the years (low service cycle demands.) Highly recommended with the AMQP SSL credential certs, and GUID approach to application layer load-balancing. Cut our operational costs around 37 times lower than traditional load-balancer approaches. =3
Agree. RabbitMQ is a Swiss Army knife that has a lot of available patterns, scales fairly well, and is very robust. If you don’t know what to choose, start with Rabbit. It will let you figure out which patterns you need and you probably won’t scale out of it. Pay someone to host it.
On the other hand, if you know what you need to do and it’s supported by it, NATS is IME the way to go (particularly JetStream).
Pulsar. Works extremely well as both a job queue and a data bus.
We have been using it in this application for half a decade now with no serious issues. I don't understand why it doesn't get more popular attention.
Pulsar vs Kafka was a significant lesson to me: The "best" technology isn't always the winner.
I put it in quotes because I'm a massive fan of Pulsar and addressing the shortcomings of Kafka. However, with regards to some choices at a former workplace: The broader existing support/integration ecosystem along with Confluent's commercial capabilities won out with regards to technology choices and I was forced to acquiesce.
A bit like Betamax vs VHS, albeit that one pre-dates me significantly.
Even StreamNative is effectively abandoning Pulsar and going all-in on the Kafka protocol. I can see the theoretical benefits of Pulsar, but it just doesn’t seem to have the ecosystem momentum to compete with the Kafka juggernaut.
The advantages of Pulsar are very much practical, at least for us. Without it we would have to manage two separate messaging systems.
I don't see any evidence of StreamNative abandoning Pulsar at this point. I do see a compatibility layer for the Kafka protocol. That's fine.
It sure looks like they’re going quite a ways beyond Kafka-on-Pulsar - the Ursa/Oxia work they’re focused on right now replaces BookKeeper and seems very firmly Kafka-oriented. Or does Ursa also work with the Pulsar protocol?
We use RabbitMQ, and workers simply pull whatever is next in the queue after they finish processing their previous jobs. I’ve never witnessed jobs piling up for a single consumer.
* Redis pub/sub
* Redis streams
* Redis lists (this is what Celery uses when Redis backend is configured)
* RabbitMQ
* ZeroMQ
This. If you have really small volume like this article describes, just use Redis.
Kafka with a different partitioner would have worked fine. The problem was that the web workers loaded up the same partition. Randomising the chosen partition would have removed, or at least alleviated, the stated problem.
Random and round robin partitioning are the configurations being discussed.
The main point of the article is that low message volumes mean you can get unlucky and end up with idle workers when there is still work to be done
(Author here)
RabbitMQ or AWS SQS are probably good choices.
Redis, SQLite or even a traditional DB like Postgres or MySQL can all do a better job than that.
Has anyone used Redpanda? I stumbled upon it when researching streaming, it claims to be Kafka compatible but higher performance and easier to manage. Haven't tried it myself but interested if anyone else has experience.
Plenty of people choose Redpanda because it’s the easiest getting started experience for kafka. There is a single binary for the full broker, where I have never seen Apache Kafka as easy to setup. It’s got a great UI as well.
The quickstart to getting everything running locally all documented here: https://docs.redpanda.com/current/get-started/quick-start/#d...
Disclaimer: I work at Redpanda
I use Temporal (https://temporal.io/)
We also use temporal, it’s pretty great. Their ui makes it easy to debug workflows too.
temporal has been pretty nice and feature rich compared to rabbitmq, DB, kafka.
Database is good recommendation.
Also give a shoutout to Beanstalkd (https://beanstalkd.github.io/)
I love beanstalkd, used it for plenty over the years and it just flies through with no fuss. Plus, it's fully supported by Rails' ActiveJob.
Sqs- easy and cheap.
Kafka. If your load is low enough for the problem described in the article to happen, your load is low enough that it's not an issue.
I'm not sure you understood the article. You can have a very low load but each task on your queue takes a while to process, in which case you want fair distribution of work.
The distribution is fair - everything is round-robin, so in the long run each worker receives the same rate of tasks. It's just "lumpy" - sometimes you can get a big batch sent to one worker, then a big batch sent to another worker - but it will all average out.
We build an Infrastructure with about 6 microservices and Kafka as main message queue (job queue).
The problem the author describes is 100% true and if you are scaled with enaugh workers this can turn out really bad.
While not beeing the only issue we faced (others are more environment/project-language specific) we got to a point where we decided to switch from kafka to rabbitmq.
thankfully early access for KIP-932 is coming in 1-3 weeks as the 4.0.0 release gets published
First time I've heard of KIP-932 and it looks very good. The two biggest issues IMO are finding a good Kafka client in the language you need (even for ruby this is a challenge) and easy at-least-once workers.
You can over partition and make at-least-once workers happen (if you have a good Kafka client), or you use an http gateway and give up safe at-least-once. Hopefully this will make it easier to build an at-least-once style gateway that's easier to work with across a variety of languages. I know many have tried in the past but not dropping messages is hard to do right.
Couldn’t agree more - the most exciting thing about KIP-932 is how much easier it’ll become to build a good HTTP push gateway.
Uber wrote a Kafka push gateway years ago, when it was considerably harder to do well: https://www.uber.com/blog/kafka-async-queuing-with-consumer-...
TFA mentions it in the third paragraph:
> Note: when Queues for Kafka (KIP-932) becomes a thing, a lot of these concerns go away. I look forward to it!
For a small load queueing system, I had great success with Apache ActiveMQ back in the days. I designed and implemented a system with the goal of triggering SMS for paid content. This was in 2012.
Ultimately, the system was fast enough that the telco company emailed us and asked to slow down our requests because their API was not keeping up.
In short: we had two Apache Camel based apps: one to look at the database for paid content schedule, and queue up the messages (phone number and content). Then, another for triggering the telco company API.
> Each of these Web workers puts those 4 records onto 4 of the topic’s partitions in a round-robin fashion. And, because they do not coordinate this, they might choose the same 4 partitions, which happen to all land on a single consumer
Then choose a different partitioning strategy. Often key based partitioning can solve this issue. Worst case scenario, you use a custom partitioning strategy.
Additionally , why can’t you match the number of consumers in consumer group to number of partitions?
The KIP mentioned seems interesting though. Kafka folks trying to make a play towards replacing all of the distributed messaging systems out there. But does seem a bit complex on the consumer side, and probably a few foot guns here for newbies to Kafka. [1]
[1] https://cwiki.apache.org/confluence/plugins/servlet/mobile?c...
What that post describes (all work going to one/few workers) in practice doesn't really happen if you properly randomize (e.g. just use random UUID) ID of the item/task when inserting it into Kafka.
With that (and sharding based on that ID/value) - all your consumers/workers will get equal amount of messages/tasks.
Both post and seemingly general theme of comments here is trashing choice of Kafka for low volume.
Interestingly both are ignoring other valid reasons/requirements making Kafka perfectly good choice despite low volume - e.g.:
- multiple different consumers/workers consuming same messages at their own pace
- needing to rewind/replay messages
- guarantee that all messages related to specific user (think bank transactions in book example of CQRS) will be handled by one pod/consumer, and in consistent order
- needing to chain async processing
And I'm probably forgetting bunch of other use cases.
And yes, even with good sharding - if you have some tasks/work being small/quick while others being big/long can still lead to non-optimal situations where small/quick is waiting for bigger one to be done.
However - if you have other valid reasons to use Kafka, and it's just this mix of small and big tasks that's making you hesitant... IMHO it's still worth trying Kafka.
Between using bigger buckets (so instead of 1 fetch more items/messages and handle work async/threads/etc), and Kafka automatically redistributing shards/partitions if some workers are slow ... You might be surprised it just works.
And sure - you might need to create more than one topic (e.g. light, medium, heavy) so your light work doesn't need to wait for heavier one.
Finally - I still didn't see anyone mention actual real deal breakers for Kafka.
From the top of my head I recall a big one is no guarantee of item/message being processed only once - even without you manually rewinding/reprocessing it.
It's possible/common to have situations where worker picks up a message from Kafka, processes (wrote/materialized/updated) it and when it's about to commit the kafka offset (effectively mark it as really done) it realizes Kafka already re-partitioned shards and now another pod owns particular partition.
So if you can't model items/messages or the rest of system in a way that can handle such things ... Say with versioning you might be able to just ignore/skip work if you know underlying materialized data/storage already incorporates it, or maybe whole thing is fine with INSERT ON DUPLICATE KEY UPDATE) - then Kafka is probably not the right solution.
(Author here)
You say: > What that post describes (all work going to one/few workers) in practice doesn't really happen if you properly randomize (e.g. just use random UUID) ID of the item/task when inserting it into Kafka.
I would love to be wrong about this, but I don't _think_ this changes things. When you have few enough messages, you can still get unlucky and randomly choose the "wrong" partitions. To me, it's a fundamental probability thing - if you roll the dice enough times, it all evens out (high enough message volume), but this article is about what happens when you _don't_ roll the dice enough times.
If it's a fundamental probability thing with randomized partition selection, put the actual probability of what you're describing in the article.
.25^20 is not a "somewhat unlucky sequence of events"
(Author here)
Fair enough. I agree .25^20 is basically infinitesimal, and even with a smaller exponent (like .25^3) the odds are not great, so I appreciate you calling this out.
Flipping this around, though, if you have 4 workers total and 3 are busy with jobs (1 idle), your next job has only a 25% chance of hitting the idle worker. This is what I see the most in practice; there is a backlog, and not all workers are busy even though there is a backlog.
With Kafka you normally don't pick a worker - Kafka does that. IIRC with some sort of consistent hashing - but for simplicity sake lets say it's just modulo 'messageID % numberOfShards'.
You control/configure numberOfShards - and its usually set to something order of magnitude bigger than your expected number of workers (to be precise - that's number of docker pods or hardware boxes/servers) - e.g. 32, 64 or 128.
So in practice - Kafka assigns multiple shards to each of your "workers" (if you have more workers than shards then some workers don't do any work).
And while each of your workers is limited to one thread for consuming Kafka messages. Each worker can still process multiple messages at the same time - in different async/threads.
The other thing that's PITA with Kafka is fail/retry.
If you want to continue processing other/newer items/messages (and usually you do), you need to commit Kafka topic offset - leaving you to figure out what to do with failed item/message.
One simple thing is just re-inserting it again into the same topic (at the end). If it was temps transient error that could be enough
Instead of same topic, you can also insert it into another failedX Kafka topic (and have topic processed by cron like scheduled task).
And if you need things like progressive backing off before attempting reprocessing - you liekly want to push failed items into something else.
While it could be another tasks system/setup where you can specify how many reprocessing attempts to make, how much time to wait before next attempt ...etc. Often it's enough to have a simple DB/table.
Having never actually used this platform before, does anybody know why they named it Kafka, with all the horrible meanings?
Per Wiktionary, Kafkaesque: [1]
1. "Marked by a senseless, disorienting, often menacing complexity."
2. "Marked by surreal distortion and often a sense of looming danger."
3. "In the manner of something written by Franz Kafka." (like the software language was written by Franz Kafka)
Example: Metamorphosis Intro: "One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections. The bedding was hardly able to cover it and seemed ready to slide off any moment. His many legs, pitifully thin compared with the size of the rest of him, waved about helplessly as he looked." [2]
[1] Wiktionary, Kafkaesque: https://en.wiktionary.org/wiki/Kafkaesque
[2] Gutenberg, Metamorphosis: https://www.gutenberg.org/cache/epub/5200/pg5200.txt
It was named so based on the Idea is that like the author (who the term "Kafkesque" is coined after), Apache Kafka is a prolific writer.
Kafka wrote a lot, and destroyed most of what he wrote.
Seems like a good name for a high-volume distributed log that deletes based on retention, not after consumption.
Jay Kreps liked Kafka’s writing.
Nominative determinism.
Because its a process: https://en.wikipedia.org/wiki/The_Trial