Jump to content

Experimental Renderer


Recommended Posts

So, I am building a renderer from scratch, mostly for fun, and I will post my progress here.

Before I try anything fancy, I will be testing how efficient my current renderer is. So, I will compare my renderer against a vanilla D3D8 version of the client.

To stress the render system, the test scene will contain 180 characters and 60 Metin stones, the resolution will be set to 1600x900 for both clients. I will compare render times, lower is better.

spacer.png

spacer.png

 

Results:

Vanilla D3D8: 18 - 20 ms;

Experimental Renderer: 4 - 5 ms;

My renderer is 4x faster. This is a good start.

  • Metin2 Dev 3
  • muscle 2
  • Love 8
Link to comment
Share on other sites

29 minutes ago, WeedHex said:

How do you generate the fake characters? I doubt you connected 180 clients 😂

You should be sure that your "clones" are same of real players in level of heaviness.

Very curious about it, keep it up 😄

Your question is interesting. When you have the source you don't need players or servers to make things show up on screen. Both clients were changed to allow me to load maps, spawn entities, control the camera, etc without needing a server or other players. All entities are as "heavy" as they normally would be if loaded/used in real gameplay.

 

11 minutes ago, Bubixon said:

It looks like swapped textures why is the texture of the middle of the city different?

Because each client is using slightly different files. The version I am using for my renderer is the one I use normally, but then I downloaded another client just for comparisons, I didn't even realize some of the textures were different until I started testing. A couple of different textures has no effect on the results.

  • Love 1
Link to comment
Share on other sites

according to my analysis, 2 things tanking fps;

rendering meshs (DrawIndexedPrimitive from CGrannyModelInstance::RenderMeshNodeListWithOneTexture function)

and

granny mesh deforming function (GrannyDeformVertices from CGrannyMesh::DeformPNTVertices function)

 

I did deforming function parallel so its fixed fps tanking but still mesh rendering tanking my client fps.. I get 100 fps with 180 player instances (thanks to sh*tty mesh rendering).

 

I planning to upgrade mesh rendering with "Hardware Instancing" method, maybe this will fix fps tanking..

  • Good 1
Link to comment
Share on other sites

10 minutes ago, Denizeri24 said:

according to my analysis, 2 things tanking fps;

rendering meshs (DrawIndexedPrimitive from CGrannyModelInstance::RenderMeshNodeListWithOneTexture function)

and

granny mesh deforming function (GrannyDeformVertices from CGrannyMesh::DeformPNTVertices function)

 

I did deforming function parallel so its fixed fps tanking but still mesh rendering tanking my client fps.. I get 100 fps with 180 player instances (thanks to sh*tty mesh rendering).

 

I planning to upgrade mesh rendering with "Hardware Instancing" method, maybe this will fix fps tanking..

That is really good performance. What hardware did you use for your tests? Maybe you could share your scene? Are you sure that in your tests you didn't have other things active (like shadows)? 100fps is good performance, I don't think you need instancing, but implementing it is not a bad idea.

I am not using instancing.

Also multithreading the mesh deformer would require a lot of syncing, since a vertex buffer lock is performed before deforming (and D3D9 functions cannot/shouldn't be called from multiple threads, unless you create the device with D3DCREATE_MULTITHREADED flag, which causes the runtime to perform the syncing for you).

The best way to render is to avoid frequent state changes and frequent locks. Shader Model 3.0 supports vertex texture fetch, and you can use it to prepare one big texture containing data for many meshes, to efficiently render them in batches.

Link to comment
Share on other sites

55 minutes ago, TheEqualizer said:

That is really good performance. What hardware did you use for your tests? Maybe you could share your scene? Are you sure that in your tests you didn't have other things active (like shadows)? 100fps is good performance, I don't think you need instancing, but implementing it is not a bad idea.

I am not using instancing.

Also multithreading the mesh deformer would require a lot of syncing, since a vertex buffer lock is performed before deforming (and D3D9 functions cannot/shouldn't be called from multiple threads, unless you create the device with D3DCREATE_MULTITHREADED flag, which causes the runtime to perform the syncing for you).

The best way to render is to avoid frequent state changes and frequent locks. Shader Model 3.0 supports vertex texture fetch, and you can use it to prepare one big texture containing data for many meshes, to efficiently render them in batches.

8700k w/ arc a770, full distance shadows + texts on

Link to comment
Share on other sites

The renderer now has a D3D11 backend.

spacer.png

 

My initial intention was to have only a D3D11 backend, but I ran into some problems with D3D11, so I decided to have a D3D9Ex backend while I worked on the D3D11 backend. This week I finally got the D3D11 backend working, so the D3D9Ex backend will be deprecated, and all development will move to the D3D11 backend.

Now with D3D11, mesh deformation is performed once, in a compute shader.

  • muscle 1
  • Love 1
  • Love 1
Link to comment
Share on other sites

Added anti-aliasing support.

spacer.png

spacer.png

spacer.png

spacer.png

spacer.png

 

"MSAA" is MSAAx4 (I chose this mode because all D3D11/D3D_FEATURE_LEVEL_11_0 GPUs are required to support it, so there is no need to check if it's supported).

FXAA does a decent job of removing aliasing but introduces some blurring.

MSAA+FXAA provides the best quality.

SMAA can be combined with MSAAx2 (a mode called "SMAA S2x" by the SMAA authors), but I chose not to implement this.

MSAA was used here only for comparison, only FXAA and SMAA will be supported. The reason is that MSAA will make supporting other features more difficult later.

  • Metin2 Dev 5
  • Good 1
  • Love 1
Link to comment
Share on other sites

9 hours ago, Denizeri24 said:

I using 8x and its drop performance like ~30 fps.

MSAAx8 seems excessive to me.

I tested MSAAx8 (in my 180 characters test scene), at 2560x1440 resolution, and there was barely any performance difference relative to MSAAx4 or MSAA off. I think either you have a driver problem or Nvidia must have some special optimization for MSAA.

If you are using D3D9, then this could be a driver issue (since Intel does not have a good D3D9 driver, I think they use an emulation layer).

Also, I seem to remember Intel saying that ResizeBAR was necessary for good performance with ARC GPUs, so if you don't have that enabled, it could be the reason of the performance hit.

Link to comment
Share on other sites

I decided to test the performance of the renderer when all the light slots are used. Right now the renderer supports a maximum of 17 lights (1 directional and 16 spot/point lights). In the test scene below, all 17 lights are active.

spacer.png

Performance was not affected. So, I will increase the maximum number of lights to 25 (1 directional, 24 spot/point lights). With some clever light management, it's possible to support much more than this, so I might revisit this later.

  • Love 1
  • Love 2
Link to comment
Share on other sites

  • Premium
9 hours ago, TheEqualizer said:

Implemented soft shadows using variance shadow mapping. Shadows add quite a bit of cost when many dynamic objects are visible, so this is a good candidate for multithreading.

spacer.png

There is decent self-shadowing as well.

spacer.png

Looks amazing

plague.png.1f5de75b42146262dcd655a5a8078

Link to comment
Share on other sites

21 minutes ago, Helia01 said:

Please tell me, do you also change the processing of effects in your changes? As far as I know this is a big problem in the original game.

Yes. The renderer is responsible for everything, anything that doesn't go through the renderer is not rendered.

Effects/Particles are pre-processed before submission so they can be rendered as efficiently as possible. I have tested having many effects at once and the impact was minimal.

One of the problems with the way Ymir renders things is that it sends very little work to the GPU (per draw call). GPUs work better when a large amount of work is sent because it allows the GPU/driver to better hide the latencies involved.

Link to comment
Share on other sites

  • 5 weeks later...

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

Announcements



×
×
  • Create New...

Important Information

Terms of Use / Privacy Policy / Guidelines / We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.