https://github.com/KerfuffleV2 — various random open source projects.

  • 5 Posts
  • 263 Comments
Joined 1 year ago
cake
Cake day: June 11th, 2023

help-circle




  • Smaller models (7B down to 350M) can handle long conversations better

    What are you basing that on? I mean, it is true there are more small models that support very long context lengths than big ones, but it’s not really because smaller models can handle them better, but because training big models takes a lot more resources. So people usually do that kind of fine-tuning on small models since training a 70B to 32K would take a crazy amount of compute and hardware.

    If you could afford fine tuning it though, I’m pretty sure the big model has at least the same inherent capabilities. Usually larger models deal with ambiguity and stuff better, so there’s a pretty good chance it would actually do better than the small model assuming everything else was equal.



  • Can you provide an example where science cannot explain a situation, because I can’t honestly think of any.

    Not OP, but there is some stuff. One big example is qualia. How does matter give rise to actual feelings, experiences of things? This isn’t something we can measure directly and it actually seems like it won’t be something we ever can measure. Might also be able to use something like “what was there before the big bang?” and that kind of thing.

    Of course, the fact that science can’t explain something doesn’t really justify falling back on magic as an explanation though. Some stuff just may not have an answer.



  • From dealing with their support in the past and stuff they’ve accommodated, I wouldn’t be surprised if you could just ask them to do it for a small amount like that. If you do a web search, you can also find a lot of information and people claiming it’s possible to do stuff like transfer it to a Paypal account, etc.

    I haven’t tried to do that personally, so maybe it really just isn’t possible. It’s still only something that will affect someone that’s never going to spend money at Amazon again, right? If I’m going to spend $5.99 at some point, it’s effectively the same as a cash refund for me. If I’m going to spend $10.99 at some point it’s almost the same as getting double the refund, since I would have spent cash instead in those cases.


  • Do we need to be more efficient?

    I mean, it’s usually a beneficial thing. Using less resources (including land) to produce the same amount of food is probably going to mean less environmental damage. In the case of switching to vat grown meat it also means not torturing billions of animals every year.

    We have the resources to feed everyone on Earth and have leftovers

    Sure. No one starves because the food just isn’t on this planet, they starve because the people who have it won’t give it to them. That said, we’re also not using resources very sustainably so saying we produce enough food currently isn’t the same as saying we can continue this way.

    We could also increase efficiency even further by reducing meat/dairy consumption.

    I don’t eat any animal products so you can probably guess this is something I’m strongly in favor of as well!

    Anyway, I was just responding to what I quoted not specifically arguing for 3d-printed foods. Depending on how it’s implemented, it may or may not be better environmentally than the status quo




  • Easily hour+ long headache on your first time.

    Whenever I read this kind of thing (and people seem to say it pretty often), it seems really weird to me. Same goes for complaining about distro installers. An hour of possible headache/irritation and then you use the machine for years. Obviously it would be better if stuff was easy, but an hour just seems insignificant in the scheme of things. I really just don’t understand seeing it as an actual roadblock.

    (Of course, there are other situations where it could matter like if you had to install/maintain 20 machines, but that’s not what we’re talking about here.)






  • Definitely very interesting, but figuring out what layers to skip is a relatively difficult problem.

    I really wish they’d shown an example of the optimal layers to skip for the 13B model. Like the paper notes, using the wrong combination of skipped layers can be worse overall. So it’s not just about how many layers you skip, but which ones as well.

    It would also be interesting to see if there are any common patterns in which layers are most skippable. It probably would be model architecture specific but it would be pretty useful if you could calculate the optimal skip pattern for say a 3B model and then translate that to a 30B with good/reasonable results.


  • The timing and similarity highly suggests this is a problem with how almost all software has implemented the webp standard in its image processing software.

    Did you read the article or the post? The point was that both places where the vulnerability was found probably used libwepb. So it’s not that there’s something inherently vulnerable in handling webp, just that they both used the same library which had a vulnerability. (Presumably the article was a little vague about the Apple side because the source wasn’t open/available.)

    given that the programs processing images often have escalated privileges.

    What? That sounds like a really strange thing to say. I guess one could argue it’s technically true because browsers can be considered “a program that processes images” and a browser component can end up in stuff with escalated privileges. That’s kind of a special case though and in general there’s no reason for the vast majority of programs that process images to have special privileges.


  • So I have never once ever considered anything produced by a LLM as true or false, because it cannot possibly do that.

    You’re looking at this in an overly literal way. It’s kind of like if you said:

    Actually, your program cannot possibly have a “bug”. Programs are digital information, so it’s ridiculous to suggest that an insect could be inside! That’s clearly impossible.

    “Bug”, “hallucination”, “lying”, etc are just convenient ways to refer to things. You don’t have to interpret them as the literal meaning of the word. It also doesn’t require anything as sophisticated as a LLM for something like a program to “lie”. Just for example, I could write a program that logs some status information. It could log that everything is fine and then immediately crash: clearly everything isn’t actually fine. I might say something about the program being “lying”, but this is just a way to refer to the way that what it’s reporting doesn’t correspond with what is factually true.

    People talk so often about how they “hallucinate”, or that they are “inaccurate”, but I think those discussions are totally irrelevant in the long term.

    It’s actually extremely relevant in terms of putting LLMs to practical use, something people are already doing. Even when talking about plain old text completion for something like a phone keyboard, it’s obviously relevant if the completions it suggests are accurate.

    So text prediction is saying when A, high probability that then B.

    This is effectively the same as “knowing” A implies B. If you get down to it, human brains don’t really “know” anything either. It’s just a bunch of neurons connected up, maybe reaching a potential and firing, maybe not, etc.

    (I wouldn’t claim to be an expert on this subject but I am reasonably well informed. I’ve written my own implementation of LLM inference and contributed to other AI-related projects as well, you can verify that with the GitHub link in my profile.)