Here I'm struggling. Who's the observer here? The guy waving his hand? (t=2d/c) A dude sitting on the screen? (t=d/c) Am I taking some data from the guy waving his hand and some data from the guy on the screen to calculate this superluminal propagation?
Oh okay, yes, I see what's confusing you.
The "observer" can be anyone at rest w.r.t. reference frame S. We have to go back to the definition of a reference frame S, which involves an array of observers and clocks at rest w.r.t. each other, at every location in S.
It may help you to imagine that you, personally, are located at the hand initial position. Your colleagues are located at the other 3 positions (hand final position; shadow initial position; shadow final position). You and your colleagues have all paced out the distances and synchronized your watches beforehand. If a hand (or shadow) arrives at someone's location, they record the time and immediately send a light signal to the other 3 colleagues. Therefore, you personally will observe the hand at its initial position, and sometime later, you will receive light signals from your colleagues reporting their observations. From those signals, you personally can reconstruct all the events.
So for example, you receive a light signal from your colleague reporting the observation of the shadow initial position at time 2d/c. But you know (because you are a competent observer) that meant the shadow actually arrived at the screen at time d/c. In relativity, we aren't talking about what people in frame S
"see", but rather, what actually happened by the agreement of
all competent observers in frame S. So we don't worry about what you vs. your colleagues personally
witnessed. Rather, we consider what everyone in frame S agrees
actually happened: the shadow
actually arrived at time d/c at the screen, even though you personally
witnessed only the light signal from your colleague at time 2d/c to report this event.
Does that make sense?