Status update July 4th

Just wanted to let you know where we are with Lemmy.world.

Issues

As you might have noticed, things still won’t work as desired… we see several issues:

Performance

  • Loading is mostly OK, but sometimes things take forever
  • We (and you) see many 502 errors, resulting in empty pages etc.
  • System load: The server is roughly at 60% cpu usage and around 25GB RAM usage. (That is, if we restart Lemmy every 30 minutes. Else memory will go to 100%)

Bugs

  • Replying to a DM doesn’t seem to work. When hitting reply, you get a box with the original message which you can edit and save (which does nothing)
  • 2FA seems to be a problem for many people. It doesn’t always work as expected.

Troubleshooting

We have many people helping us, with (site) moderation, sysadmin, troubleshooting, advise etc. There currently are 25 people in our Discord, including admins of other servers. In the Sysadmin channel we are with 8 people. We do troubleshooting sessions with these, and sometimes others. One of the Lemmy devs, @[email protected] is also helping with current issues.

So, all is not yet running smoothly as we hoped, but with all this help we’ll surely get there! Also thank you all for the donations, this helps giving the possibility to use the hardware and tools needed to keep Lemmy.world running!

  • cristalcommons@lemmy.world
    link
    fedilink
    arrow-up
    90
    ·
    1 year ago

    i just wanted to thank you for doing your best to fix lemmy.world as soon as possible.

    but please, don’t feel forced to overwork yourselves. i understand you want to do it soon so more people can move from Reddit, but i wouldn’t like that Lemmy software and community developers overwork and feel miserable, as those things are some of the very motives you escaped from Reddit in first place.

    in my opinion, it would be nice that we users understand this situation and, if we want lemmy so bad, we actively help with it.

    this applies to all lemmy instances and communities, ofc. have a nice day you all! ^^

    • Cinner@lemmy.worldB
      link
      fedilink
      arrow-up
      36
      ·
      1 year ago

      Plus, slow steady growth means eventual success. Burnout is very real if you never take a break.

      • cristalcommons@lemmy.world
        link
        fedilink
        arrow-up
        10
        ·
        edit-2
        1 year ago

        so true, pal! slowly, with patience, no rushing, putting love into it, organizing ourselves, working smart is better than working hard and fast.

        because of the federated nature of fediverses like Lemmy, it is very possible that many people are doing the very same task without even knowing they are duping each other’s efforts.

        and that’s sad because if they knew, they could be teaming up, or splitting the task in two, in order to avoid wasting different efforts into dupe results.

        i have learnt a thing or two about burnout, it’s better for me to make 40% planning, and 40 % self-care and so the 20 % of execution becomes piece of cake.

        but this is just my opinion. anyway, please take care, pals <3

  • Shartacus@lemmy.world
    link
    fedilink
    arrow-up
    87
    arrow-down
    1
    ·
    1 year ago

    I want this to succeed so badly. I truly feel like it’s going to be sink or swim and will reflect how all enshitification efforts will play out.

    Band together now and people see there’s a chance. Fail and we are doomed to corporate greed in every facet of our lives.

  • Frost Wolf@lemmy.world
    link
    fedilink
    arrow-up
    72
    ·
    1 year ago

    This is the level of transparency that most companies should strive for. Ironic that in terms of fixing things, volunteer and passion projects seem to be more on top of issues compared to big companies with hundreds of employees.

    • fuck reddit@lemmy.ml
      link
      fedilink
      arrow-up
      23
      ·
      1 year ago

      You said it: passion projects. While being paid is surely a motivator, seeing your pet project take off the way Lemmy is can be so intoxicating and rewarding! I plan to donate as soon as I get paid on Friday! I want to see this succeed, even if it is just to spite Reddit, and I am willing to pay for the pleasure.

  • czarrie@lemmy.world
    link
    fedilink
    arrow-up
    70
    ·
    1 year ago

    I’m just excited to be back in the Wild West again – all of the big players had bumps, at least this one is working to fix them.

  • Deez@lemm.ee
    link
    fedilink
    arrow-up
    53
    ·
    1 year ago

    Thanks for all of your effort. Even though we are on different instances, it’s important for the Fediverse community that you succeed. You are doing valuable work, and I appreciate it.

    • TheSaneWriter@vlemmy.net
      link
      fedilink
      arrow-up
      7
      ·
      1 year ago

      Not just that, but the code contributed to Lemmy by this debugging will make Lemmy run faster for everyone on every instance, which is makes the ecosystem that much better.

  • LeHappStick@lemmy.world
    link
    fedilink
    arrow-up
    40
    ·
    edit-2
    1 year ago

    .world is definitely running smoother than when I joined 3 days ago, back then it was impossible to comment and the lag was immense, right now I just have to occasionally reload the page, but that’s nothing in comparison.

    You guys are doing an amazing work! I’m broke, so here are some coins 🪙🪙🪙🪙 beans 🫘🫘🫘🫘

  • TomFrost@lemmy.world
    link
    fedilink
    arrow-up
    37
    arrow-down
    1
    ·
    1 year ago

    Cloud architect here— I’m sure someone’s probably already brought it up, but I’m curious if any cloud native services have been considered to take the place of what I’m sure are wildly expensive server machines. E.g. serve frontends from cloudfront, host the read-side API on Lambda@Edge so you can aggressively and regionally cache API responses, anything other than an SQL for the database — model it in DynamoDB for dirt cheap wicked speed, or Neptune for a graph database that’s more expensive but more featureful. Drop sync jobs for federated connections into SQS, have a lambda process that too, and it will scale as horizontally as you need to clear the queue in reasonable time.

    It’s not quite as simple to develop and deploy as docker containers you can throw anywhere, but the massive scale you can achieve with that for fractions of the cost of servers or fargate with that much RAM is pretty great.

    Or maybe you already tried/modeled this and discovered it’s terrible for you use case, in which case ignore me ;-)

    • Olap@lemmy.world
      link
      fedilink
      arrow-up
      34
      ·
      1 year ago

      You were so close until you mentioned trying to ditch SQL. Lemmy is 100% tied hard to it, and trying to replicate what it does without ACID and Joins is going to require a massive rewrite. More importantly - Lemmy’s docs suggest a docker-compose stack, not even k8s for now, it’s trying really hard not to tie into a single cloud provider and avoid having three cloud deployment scripts. Which means SQS, lambdas and cloudfront out in the short term. Quick question, are there any STOMP compliant vendors for SQS and lambda equivalent yet?

      Also, the growth lemmy.world has seen has been far outside what any team could handle ime. Most products would have closed signups to handle current load and scale, well done to all involved!

      • jamesorlakin@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        If Postgres becomes the bottleneck I wonder whether something like Citus could work to shard the data (relatively) transparently?

        • irdc@derp.foo
          link
          fedilink
          arrow-up
          3
          ·
          1 year ago

          One could also move to having multiple read-only PostgreSQL replica instances used when generating the site and a single read-write instance that you’d use whenever anything changes (which is comparatively rare).

          • jamesorlakin@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            1 year ago

            True, but that would likely require some code changes in Lemmy to segregate read queries and avoid using the replica if it’s a transaction that might read and write.

    • b3nsn0w@pricefield.org
      link
      fedilink
      arrow-up
      15
      ·
      1 year ago

      cloudfront helps a lot with the client and is absolutely compatible with lemmy if you set it up correctly. possibly it could also help cache api responses, i haven’t looked into that part yet.

      the database, on the other hand, would need a nearly full rewrite. lemmy uses postgres and dumping it for something else would be a huge pain for the entire federated community. it could probably tear it in half.

      there’s also the issue of pictrs, which uses a stateful container and isn’t yet able to use an external database which would allow you to scale it horizontally. resolving that one is on the roadmap though, and for the most part you can aggressively cache the pictrs get requests to alleviate the read-side load.

      but whatever the solution is, it kinda needs to be as simple as developing and deploying docker containers you can throw anywhere. the vendor-agnostic setup is a very important part of the open-source setup of lemmy. it’s fine to build on top of that, but currently anyone with docker-compose installed can run the service and that really should be retained.

    • MrPoopyButthole@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      Staying cloud agnostic is very important and CDN services like cloudflare/cloudfront have inherrent privacy issues. IMO the stack should remain hostable on anyones home server environment.

  • Kalcifer@lemmy.world
    link
    fedilink
    English
    arrow-up
    30
    arrow-down
    2
    ·
    edit-2
    1 year ago

    That is, if we restart Lemmy every 30 minutes. Else memory will go to 100%

    Lemmy has a memory leak? Or, should I say, a “lemmory leak”?

        • donalonzo@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          ·
          1 year ago

          Rust protects you from segfaulting and trying to access deallocated memory, but doesn’t protect you from just deciding to keep everything in memory. That’s a design choice. The original developers probably didn’t expect such a deluge of users.

        • Bad3r@lemmy.one
          link
          fedilink
          arrow-up
          4
          ·
          edit-2
          1 year ago

          Leaking memory is safe

          Rust’s memory safety guarantees make it difficult, but not impossible, to accidentally create memory that is never cleaned up (known as a memory leak). Preventing memory leaks entirely is not one of Rust’s guarantees in the same way that disallowing data races at compile time is, meaning memory leaks are memory safe in Rust. We can see that Rust allows memory leaks by using Rc<T> and RefCell<T>: it’s possible to create references where items refer to each other in a cycle. This creates memory leaks because the reference count of each item in the cycle will never reach 0, and the values will never be dropped.

        • SomeOtherUsername@lemmynsfw.com
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          I’m calling it - if there’s actually a memory leak in the Rust code, it’s gonna be the in memory queues because the DB’s iops can’t cope with the number of users.

          • SomeOtherUsername@lemmynsfw.com
            link
            fedilink
            English
            arrow-up
            18
            ·
            edit-2
            1 year ago

            I think I found what eats the memory. DB iops isn’t the cause - looks like the server doesn’t reply before all the database operations are done. The problem is the unbounded queue in the activitypub_federation crate, spawned when creating the ActivityQueue struct. The point is, this queue holds all the “activities” - events to be sent to federated instances. If, for whatever reason, the events aren’t delivered to all the federated servers, they are retried with an exponential backoff for up to 2.5 days. If even a single federated instance is unreachable, all events remain in memory. For a large instance, this will eat up the memory for every upvote/downvote, post or comment.

            Lemmy needs to figure out a scalable eventual consistency algorithm. Most importantly, to store the messages in the DB, not in memory.

    • grue@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      1 year ago

      Lemmy has a memory leak? Or, should I say, a “lemmory leak”?

      A lemmory meek, obviously!

    • 257m@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Wait isn’t lemmy written in rust how do you create a memory leak in rust? Unsafe mode?

  • Snow-Foxx@lemmy.world
    link
    fedilink
    arrow-up
    27
    ·
    1 year ago

    Thank you so much for your hard work and for fixing everything tirelessly, so that we can waste some time with posting beans and stuff lol.

    Seriously, you’re doing a great job <3

  • Flying Squid@lemmy.world
    link
    fedilink
    arrow-up
    26
    ·
    1 year ago

    I am very forgiving of the bugs I encounter on Lemmy instances because Lemmy is still growing and it’s essentially still in beta. I am totally unforgiving of Reddit crashing virtually every day after almost two decades.

  • repungnant_canary@vlemmy.net
    link
    fedilink
    arrow-up
    26
    arrow-down
    1
    ·
    1 year ago

    The need to restart server every so often to avoid excessive ram usage bit is very interesting to me. This sounds like some issue with memory management. Not necessarily a leak, but maybe something like server keeping unnecessary references so the object cannot be dropped.

    Anyway, from my experience Rust developers love debugging such kind of problems. Are Lemmy Devs aware of this issue? And do you publish server usage logs somewhere to look deeper into that?

  • Eczpurt@lemmy.world
    link
    fedilink
    arrow-up
    23
    ·
    1 year ago

    Really appreciate all the time and effort you all put in especially while Lemmy is growing so fast. Couldn’t happen without you!