I haven't studied the federation code, but my understanding is that it's on a retry basis. A message that fails delivery would retry for "a while", long enough to handle an instance getting rebooted or going down for maintenance for an hour.
But if something fails delivery for days, the sending server will probably stop trying... which is kind of a desirable property to prevent a server coming back online from getting inundated with a backlog of failed deliveries it can never catch up on.
A backlog reconciliation feature would be cool, but I don't think it works that way today.