By jjmcc


2019-01-07 17:50:49 8 Comments

I have a UWP DX11 application and I've recently removed the vsync and noticed that my fps goes down drastically when I'm only drawing very basic sprites on the screen and I'm trying to learn why this is so I can improve it. I would not expect it to drop this low for only rendering sprites.

With only a single sprite drawing on the screen, and everything still present in the update loop I am getting around ~3000 fps which is what I would expect for my graphics card.

However, there is a huge drop when I'm only rendering the projectiles on the screen. enter image description here

Just from rendering these projectiles I dropped from around ~3000 fps to ~500 which is an absurd amount when they are so small in size. When I render the other entities on the screen with no projectiles, I am dropping to around ~200.

The fact I can run modern 3d games at a higher fps than I can render about ~20 sprites on my screen proves I am doing something wrong, but I'm not sure what. Could anyone provide any points as to why this might be?

If there is any additional information required I can gladly assist, I just don't know what code is required for this type of issue.


void Update()
{
    m_timer.Tick([&]()
    {
        //update keypresses
        m_keybindManager->Update(m_timer);

        //update player
        m_player->Update(m_timer);

        //update projectiles
        m_projectileManager->Update(m_timer);

        uint32 fps = m_timer.GetFramesPerSecond();
        std::stringstream ss;
        ss << "FPS: " << fps << "\n" << std::endl;
        OutputDebugStringA(ss.str().c_str());

    });
}

bool Render()
{
    if (m_timer.GetFrameCount() == 0)
    {
        return false;
    }

    SpriteRenderer* renderer = m_spriteRenderer.get();

    //set up the renderer before drawing
    renderer->InitializeRenderer();

    //draw player
    Sprite& s = m_player->GetSprite();
    s.Render(*renderer);

    //draw projectiles
    m_projectileManager->Render(*renderer);

    return true;
}

ProjectileManager Render

void ProjectileManager::Render(IRenderer& renderer) {

    for (std::shared_ptr<Projectile>& projectile : m_projectiles) {
        Sprite& sprite = projectile->GetSprite();

        sprite.Render(renderer);
    }

}

Sprite Update/Render

void Sprite::Update(DX::StepTimer const & timer)
{
    if (m_cycleTextures) {

        uint64_t currentTime = Time::CurrentTimeMilliseconds();

        if (currentTime > (m_lastCycle + (uint64_t)m_cycleDelay)) {
            m_cycleId = (m_cycleId + 1) % m_textures.size();

            m_texture = m_textures[m_cycleId];

            m_lastCycle = currentTime;
        }
    }

}

void Sprite::Render(IRenderer& renderer) {
    Rendering::SpriteRenderDescription renderDescription =
        Rendering::SpriteRenderDescription(m_texture.Get(), m_vertexBuffer.GetAddressOf(), m_indexBuffer.GetAddressOf(), m_worldMatrix);

    renderer.Render(renderDescription);
}

SpriteRenderer Render

void SpriteRenderer::Render(Rendering::RenderDescription const& renderDescription)
{
    Rendering::SpriteRenderDescription const& spriteRenderDescription = dynamic_cast<Rendering::SpriteRenderDescription const&>(renderDescription);

    if (!m_loadingComplete)
    {
        return;
    }

    auto context = m_deviceResources->GetD3DDeviceContext();

    //stores the world/model for the object being rendered
    XMStoreFloat4x4(&m_constantBufferData.model, XMLoadFloat4x4(&spriteRenderDescription.m_world));


    // Prepare the constant buffer to send it to the graphics device.
    context->UpdateSubresource1(
        m_constantBuffer.Get(),
        0,
        NULL,
        &m_constantBufferData,
        0,
        0,
        0
    );

    // Each vertex is one instance of the VertexPositionColor struct.
    UINT stride = sizeof(ShaderStructures::VertexTextureCoordinates);
    UINT offset = 0;
    context->IASetVertexBuffers(
        0,
        1,
        spriteRenderDescription.m_vertexBuffer,
        &stride,
        &offset
    );

    context->IASetIndexBuffer(
        *spriteRenderDescription.m_indexBuffer,
        DXGI_FORMAT_R16_UINT, // Each index is one 16-bit unsigned integer (short).
        0
    );

    //update texture
    context->PSSetSamplers(0, 1, m_sampleState.GetAddressOf());
    context->PSSetShaderResources(0, 1, &spriteRenderDescription.m_texture);

    ID3D11RasterizerState* rasterState;
    D3D11_RASTERIZER_DESC wfdesc;
    ZeroMemory(&wfdesc, sizeof(D3D11_RASTERIZER_DESC));
    wfdesc.FillMode = D3D11_FILL_SOLID;
    wfdesc.CullMode = D3D11_CULL_NONE;
    m_deviceResources->GetD3DDevice()->CreateRasterizerState(&wfdesc, &rasterState);
    context->RSSetState(rasterState);


    context->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);

    context->IASetInputLayout(m_inputLayout.Get());

    // Attach our vertex shader.
    context->VSSetShader(
        m_vertexShader.Get(),
        nullptr,
        0
    );

    // Send the constant buffer to the graphics device.
    context->VSSetConstantBuffers1(
        0,
        1,
        m_constantBuffer.GetAddressOf(),
        nullptr,
        nullptr
    );

    // Attach our pixel shader.
    context->PSSetShader(
        m_pixelShader.Get(),
        nullptr,
        0
    );

    // Draw the objects.
    context->DrawIndexed(
        6,
        0,
        0
    );

}

2 comments

@ThisIsTheDave 2019-01-09 18:42:27

I wonder if your bottleneck is the number of draw calls, and not the number and size of sprites being drawn.

At some point I learned a heuristic that says you can execute about 3000 draw calls per frame to maintain a frame rate of 30 FPS. This limitation exists because of the CPU overhead imposed by DirectX and the GPU driver. In other words, even though you're only drawing a single tiny sprite per draw call, your frame rate will decline rapidly because of the overhead of setting up individual vertex buffers, index buffers, constant buffers, raster states, etc. Note that because this is a limitation in the GPU driver on the CPU, you will experience declining FPS despite having a very powerful GPU. On the other hand, it still looks like you're only calling DrawIndexed about 20x/frame, which leaves plenty of room for more. But I'm accustomed to being overjoyed at anything in excess of 60FPS, so I don't pay much attention to the bottlenecks in the 500FPS-3000FPS range.

If you are in fact limited by draw calls, you can fix it by rendering multiple sprites in a single draw call instead of many calls to DrawIndexed . (Perhaps SpriteBatch works this way, as @ChuckWalborn suggested, or try using instancing.) Another more invasive way to avoid draw call overhead is to use a low-overhead graphics API like Vulkan, Mantle, or DX 12's low level API.

The most obvious limitation when drawing a bunch of sprites is that you're limited by texture bandwidth on the GPU, but this seems implausible in your case given that you're drawing a small number of a single small sprite. Nevertheless, you can easily test for texture bandwidth bottleneck by cranking up the MIP bias on your GPU, which will force the GPU to use the highest (blurriest) MIP for all your textures, meaning it's only going to read a single texel for each texture.

@Josh 2019-01-07 18:04:37

The big red flags I see after a quick look over the code you posted are

  • You're using dynamic_cast inside your Render function; dynamic_cast has a runtime overhead and is often "smells" of a design problem. You should look at re-engineering the need for this cast. It seems to me that you can just pass SpriteRenderDescription in here, since SpriteRenderer can't render anything else anyhow. If SpriteRenderer::Render is taking the base type because it's part of some virtual interface, consider de-virtualizing that interface (I don't have enough context for your code to suggest how, unfortunately).

  • You're updating the constant buffer for the sprite for every render of what appears to be every individual sprite. It also looks like you maybe have one vertex/index buffer (et cetera) per sprite. This generates a lot of overhead, whereas instead you could probably implement some kind of batching system where multiple sprites are contained in a single constant/vertex/index buffer (and thus a single draw call). This would reduce that overhead. Similar with the management of textures and input layout states for shaders, et cetera.

  • You're doing calling CreateRasterizerState every time you render an individual sprite. You should only create these states once, at startup, and reference them as-needed during rendering (especially since they are exactly the same for every sprite). The runtime should de-duplicate the underlying state objects, but there's still overhead in that calculation that you can simply avoid by making the state object once.

I'd venture to guess that batching as much as you can is what's going to solve most of your performance woes. If you can only tackle one thing from the list above, I'd tackle that.

@jjmcc 2019-01-07 18:19:53

Thanks for the detailed answer! I'll certainly look at all of them. The first issue you mentioned was because it is part of an interface and I didn't want to couple some other class with a specific renderer. The second issue, each sprite does have its own index/vertex buffer which after looking at it again doesn't really make sense as they are all based on a quad, which is the same indices/vertices. Is there any more code I can provide to get any further suggestions on the first problem and removing the dynamic_cast without removing the interface?

@Josh 2019-01-07 18:22:08

That's probably veering a bit off-topic for this specific question; if you posted a new question (or asked in the Game Development Chat) I'm sure you'd get some guidance. It would help to see the interface in question, and to have an answer ready to the initial question "why is this interface virtual at all?"

@jjmcc 2019-01-07 21:11:38

Can you expand a bit on the second point where you talk about the batching? Any brief examples on how I'd go about this as the only way I've really been taught is a single draw call per object. Right now I'm rendering 40 sprites to the screen and with my current implementation it's 40 draw calls, and each object has its own buffer so 40 index/vertex buffers.

@Chuck Walbourn 2019-01-07 22:18:07

Use SpriteBatch in the DirectX Tool Kit.

@jjmcc 2019-01-08 07:39:10

@ChuckWalbourn I will eventually be moving to use that, but I need to learn what is actually bringing down the performance as I can't use SpriteBatch for 3d rendering

@Josh 2019-01-08 16:40:46

If you poke around the site, there's a few questions addressing sprite batching techniques that may help you. Perusing the source code linked by Chuck Walbourn may also help. The basic idea is to group data for many sprites (that share a texture/shader "material") into one large buffer and draw that. Depending on the size you choose for the buffer, you may ultimately need more, but you still end up with fewer buffers and draw calls than a 1:1 approach.

Related Questions

Sponsored Content

1 Answered Questions

Queued rendering in LibGDX 2D

  • 2018-08-12 14:19:36
  • dumdumgames
  • 121 View
  • 0 Score
  • 1 Answer
  • Tags:   2d libgdx rendering

1 Answered Questions

1 Answered Questions

[SOLVED] Game Taking too long during render cycles

2 Answered Questions

[SOLVED] Why does rendering to a texture cause transparency here?

3 Answered Questions

1 Answered Questions

[SOLVED] Tile-Based Deferred Lights Flicker

0 Answered Questions

Moving Draw functions out of GameObjects

2 Answered Questions

[SOLVED] FPS drop when detecting collision (sfml, c++)

0 Answered Questions

DXGI - Frame rate drops from ~8000 FPS to ~1500 FPS when switching to full screen mode

  • 2014-10-23 10:11:53
  • user25894
  • 249 View
  • 2 Score
  • 0 Answer
  • Tags:   directx11

2 Answered Questions

[SOLVED] Irrlicht rendering basic OBJ

Sponsored Content