I can see why you are confused.
I made a tutorial about it over here.
But otherwise it just looks like this:
Ah i got it. That name part is really in the text box. The only different is, you use a background instead of a window skin, and call pictures by common event.
But there's something strange... I've take a look at the demo. That mean, i will have to call common event "Rearrange Line Space" and "Who is Speaking?" for every time i want to change the face picture? Still, about the "Who is Speaking?" common event, i can see the condition branch to call show picture common event, but the condition, variable 2, i can't see any variable change in event. Those show picture events, too. I can't see any change to variable 3 in normal event.