By Kadam Parikh


2018-11-01 18:20:13 8 Comments

I am trying to understand the use of TimeDistributed layer in keras/tensorflow. I have read some threads and articles but still I didn't get it properly.

The threads that gave me some understanding of what the TImeDistributed layer does are -

What is the role of TimeDistributed layer in Keras?

TimeDistributed(Dense) vs Dense in Keras - Same number of parameters

But I still don't know why the layer is actually used!

For example, both the below codes will provide same output (& output_shape):

model = Sequential()
model.add(TimeDistributed(LSTM(5, input_shape = (10, 20), return_sequences = True)))
print(model.output_shape)

model = Sequential()
model.add(LSTM(5, input_shape = (10, 20), return_sequences = True))
print(model.output_shape)

And the output shape will be (according to my knowledge) -

(None, 10, 5)

So, if both the models provide same output, what is actually the use of TimeDistributed Layer?

And I also had one other question. TimeDistributed layer applies time related data to separate layers (sharing same weights). So, how is it different from unrolling the LSTM layer which is provided in keras API as:

unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.

What is the difference between these two?

Thank you.. I am still a newbie and so have many questions.

1 comments

@SaTa 2018-11-07 05:08:50

As Keras documentation suggests TimeDistributed is a wrapper that applies a layer to every temporal slice of an input.

Here is an example which might help:

Let's say that you have video samples of cats and your task is a simple video classification problem, returning 0 if the cat is not moving or 1 if the cat is moving. Let's assume your input dim is (None, 50, 25, 25, 3) which means you have 50 time steps or frames per sample, and your frames are 25 by 25 and have 3 channels, rgb.

Well, one aporoach would be to extract some "features" from each frame using CNN, like Conv2D, and then pass them to an LSTM layer. But the feature extraction would be the same for each frame. Now TimeDistributed comes to the rescue. You can wrap your Conv2D with it, then pass the output to a Flatten layer wrapped also by TimeDistributed. So after applying TimeDistributed(Conv2D(...)), the output would be something of dim like (None, 50, 5, 5, 16), and after TimeDistributed(Flatten()), the output would be of dim (None, 50, 400). (The actual dim would depend on Conv2D parameters.)

The output at this layer now can be passes through LSTM.

So obviously, LSTM itself does not need a TimeDistributed wrapper.

@Asynchronousx 2020-01-30 10:38:26

AWESOME explanation, you just made me understand what TimeDistribuited is useful for. Kudos!

Related Questions

Sponsored Content

4 Answered Questions

3 Answered Questions

[SOLVED] Understanding Keras LSTMs

3 Answered Questions

1 Answered Questions

1 Answered Questions

[SOLVED] TimeDistribution Wrapper Fails the Compilation

1 Answered Questions

[SOLVED] Can TimeDistributed Layer used for many-to-one LSTM?

2 Answered Questions

1 Answered Questions

[SOLVED] Keras: stacking multiple LSTM layer with

Sponsored Content