I trained a Many-to-Many sequence model in Keras with `return_sequences=True`

and `TimeDistributed`

wrapper on the last Dense layer:

```
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=50))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
# train...
model.save_weights("weights.h5")
```

So during the training the loss is calculated over all hidden states (in every timestamp). But for inference I only need the get output on the last timestamp. So I load the weights into Many-to-One sequence model for inference without `TimeDistributed`

wrapper and I set `return_sequences=False`

to get only last output of the LSTM layer:

```
inference_model = Sequential()
inference_model.add(Embedding(input_dim=vocab_size, output_dim=50))
inference_model.add(LSTM(100, return_sequences=False))
inference_model.add(Dense(vocab_size, activation='softmax'))
inference_model.load_weights("weights.h5")
```

When I test my inference model on a sequence with length 20 I expect to get a prediction with shape (vocab_size) but `inference_model.predict(...)`

still returns predictions for every timestamp - a tensor of shape (20, vocab_size)

## 1 comments

## @today 2019-03-07 14:31:38

If, for whatever reason, you need only the last timestep during inference, you can build a new model which applies the trained model on the input and returns the last timestep as its output using the

`Lambda`

layer:Side Note:As already stated in this answer,`TimeDistributed(Dense(...))`

and`Dense(...)`

are equivalent, since`Dense`

layer is applied on the last dimension of its input Tensor. Hence, that's why you get the same output shape.## @nidomo 2019-03-07 15:22:47

Oh. Is there a way to apply TimeDistributed(Dense(...)) to every timestamp of LSTM output?

## @today 2019-03-07 17:12:16

@nidomo Well, I am not sure what you mean exactly as it is already applied on all the timesteps.