I am trying to understand the Keras layers better. I am working on a sequence to sequence model where I embed a sentence and pass it to a LSTM that returns sequences. Hereafter, I want to apply a Dense layer to each timestep (word) in the sentence and it seems like TimeDistributed does the job for three-dimensional tensors like this case.
In my understanding, Dense layers only work for two-dimensional tensors and TimeDistributed just applies the same dense on every timestep in three dimensions. Could one then not simply flatten the timesteps, apply a dense layer and perform a reshape to obtain the same result or are these not equivalent in some way that I am missing?