This article was written by a retard. There are so many errors and a huge lack of understanding in how this shit works at all.
Just an example:
But the Transformer has a big flaw. It uses something called "attention," where the computer program takes the information in one group of symbols, such as words, and moves that information to a new group of symbols, such as the answer you see from ChatGPT, which is the output.
It doesn't "move that information", it uses it to weight which tokens (not "symbols") are most likely the best next tokens. FFS.
That attention operation -- the essential tool of all large language programs, including ChatGPT and GPT-4 -- has "quadratic" computational complexity (Wiki "time complexity" of computing). That complexity means the amount of time it takes for ChatGPT to produce an answer increases as the square of the amount of data it is fed as input.
No, it doesn't mean the time it takes to produce an answer is quadratic, it means the amount of MEMORY required is quadratic.
At some point, if there is too much data -- too many words in the prompt, or too many strings of conversations over hours and hours of chatting with the program -- then either the program gets bogged down providing an answer, or it must be given more and more GPU chips to run faster and faster, leading to a surge in computing requirements.
No, it has a fixed context length. Adding more GPUs would speed up inference, but do nothing for how much context it can handle.
Did a fucking nigger write this shit?
Anyway, here's a link to the actual paper they're talking about (Hydra), since the article is worse than useless. https://arxiv.org/abs/2302.10866
(post is archived)