this is what they admit to. What do you think DARPA has?
They mention GPT3, what they fail to mention is it cant hold the thread of a conversation.
For natural language I would start by building a phoneme based equivalent to GPT, or carefully tuning the latency between translation and response. This is because GPT is an algorithm that makes predictions one character at a time last I checked.
The second problem is verisimilitude: whats believable vs the uncanny valley. Not an expert in soft robotics, but much of the current valley seems to consist of failure to convey proper microexpressions, and the combinations thereof. For an example of the uncanny valley popping up in microexpressions, look at this very video, where the muscles around the eyes and eyelids don't naturally follow the movement of the eye.
The third problem is actually inherent to both the general approach and the specific technologies used. GPT is a blackbox. A believable robot is as much about the role it fits as the accuracy and naturalness of its speech. For an example of this go talk to any old ALICE instance vs watching west world. Notice in that fictional setting, how the machines have a certain consistency of character both between dialogues, and within dialogues. This is less of a problem if you can instrument your algorithms and theres been some research toward doing away with blackbox implementations. For details see visual generation of faces using GANS, and related papers on how researchers developed techniques for seeing precisely what layers and nodes do what, and the paramaterization of various features outputted by the face generators in question (age, hair color, maleness vs femaleness, eye color, facial expression, etc). Its really quite impressive.
This "unwrapping the blackbox" is actually seriously important for more than just debugging or analysis. Its the key to allowing systems to "self debug" at human, or super-human standards.
Much as an adversarial system might decide whats "realistic or not", by competing with a generator, this same paradigm can be applied to any network output.
instrumenting all the various aspects of an algorithm allows you to, for example, generate many potential outputs from something like GPT, and then select among those for a variety of factors
does it fit information that needs to be conveyed to the person currently talking to the machine?
does it fit the "persona" of the machine? Or even just boil it down to speech style. Is this machine supposed to be a bank teller? Then you don't want it dropping down into "dude bro" casual speech, or giving long-winded stories to customers who just want to open an account.
is it "context" appropriate? Cracking a joke while giving someone a cancer diagnosis, or going off on a dream-like tangent as GPT often does, is a no go. There is to say a certain level of "appropriateness" to any scenario, dependent on both the broad context, and those involved
While larger and more sophisticated models taking their core from blackboxes like GPT may eliminate much of the akwardness and edge cases of current approaches, I don't see them fully fixing that problem from within any one model. Nor is strictly a "random forest" type approach going to work fully either. But rather, we have to replace the human in the loop, whos job it is typically to say "ok, this is our company, we make xyz robot/algorithm/ai, its pretty good but heres the weekly output with customer complaints of edge cases and uncanny valley incidents. Lets do some research, and improve the algorithms, find a new algorithm, or developer exceptions for these problems. Then next week, we do the same thing."
Thats never going to work, its always going to have people as the limiting factor for what is believable, a continuous horizon of diminishing returns. Instead machines have to be given a certain standard for what is "wrong" or "out of place" and then be applied to existing 'sufficient' solutions, to properly filter out edge cases.
This is even how its done in the brain using the PMC (premotorcortex) to generate potential movements, most of which are nonsense. The PMC averages and critically, filters the ones that are below a certain threshold of acceptability, whatever the given definition of "acceptable" is.
So we see, even in one of nature's most complex devices, the human brain, it mimics this structure, of algorithms judging algorithms, of taking almost passable, and filtering out everything that doesn't fit strict criteria, whether it be "would this sentence fit the given context, dialogue, or persona?", or "would this action fit the general role this machine plays?" be it a bank teller or a cunning whore running a saloon.
tl;dr 1. set a standard, or constraints for speech and behavior 2. generate a lot of potential output 3. use ML to grade and filter the outputs of other ML, before deciding on what actually makes it to the execution stage.
And the standards are where you write your "scripts", "personalities", dialogues, and high level general and contextual behaviors.
Looks fragile and easy to kill. Quit being a pussy.
How long the battery last?
What battery? They'll run on human souls.
There is the key. All of this bullshit stops when the power is disrupted.
Its awesome, I want one or two to help around the house, hang out with, and be my slaves.
Sure, go ahead and do that.
Ok, thanks for the motivation
(post is archived)