Sherer

October 24, 2019

M2 Download Center

This is the hidden content, please

Sign In

or

Sign Up

( Internal )

Hello,

Didn't know whether to place this subject either here or on guides but since I'm gonna not only share the mitigation but also express my conclusion about this issue and further after-effects.
Only part of codes are attached below.
For methods definitions + headers jump here:

This is the hidden content, please

1. Vulnerability overview.

Any kind of tcp application is required to have a server (anda client). Once a server is launched it's binded to appropriate socket and set on listening for further connections (in a cutoff, navigate here for more precise info)
On the other side client is the one which is suppose to connect to server. It does it by connecting to server's socket and start to process a handshake.
This is how it works in a big shortcut - or I should rather say, it does work if we deal with normal peer.
In case of more-like modern apps the traffic is held by efficient algorithms that can carry even heavy traffic or eventual feeble attack.
But since Metin2's core is not a state-of-art app and simply runs on old C style code (libthecore) without a real concurrency support that might dive us into some tricky issues.
So let's imagine what happens if someone tries to pull off an attack and flood the server with enormous amount of packets.
Since each of connection needs to be validated first, it goes through the handshake.

Server catches it through the fdwatch function (jump to io_loop, main.cpp), then if no desc is presented moves it to AcceptDesc where the connection is validated.
It is usually allowed unless the peer's ip is not in a banlist. Then desc is created and connection is saved for further handshake response.
Sounds reasonable, right?
Now imagine when thousands connection are accepted, validated, and expected to send handshake response.
Does it still sound so optimistic?
Additionally, when a desc is created it does allocated buffer for I/O traffic. Each connection. Each time. Each desc.

Now do the math and try to conceive how much of memory is simply wasted.
So that's the point we've been heading.

That's our vulnerability.
And btw, if so many agents are in the queue, do you think anybody will be able to connect your server?

2. Four major problems.

Let's start from handshake itself.
Imagine that someone approaches you and gives you his hand to shake it.
And than second.
And then third.
Doesn't make sense, does it?
Same applies to handshake process. Simply, only one handshake shall be ongoing for the host unless it completes it.
So, let's jump to AcceptDesc in desc_manager.cpp and add this little code above:

newd = M2_NEW DESC;

	// Let's check if a handshake from this host is already ongoing
	if (GetHostHandshake(peer))
	{
		sys_log(0, "Handshake from %s is not permitted!", host);
		socket_close(desc);
		return NULL;
	}

So let's consider this as solved for now.

Then the second issue.
Let's imagine yet another event.
Someone shakes your hand but this time completes the handshake. But he does it again. And again. And again.
Sounds exhausting?
Let's add this 2 conditions below our recent code from above:

	static const int HOST_CONNECTION_LIMIT = 3;
	// In case if host completed handshake process let's check if it doesn't reach the limit
	if (GetHostConnectionCount(peer) >= HOST_CONNECTION_LIMIT)
	{
		sys_log(0, "Host %s connection limit has been reached!", host);
		socket_close(desc);
		return NULL;
	}

	// And block intrusive connections as well
	if (IsIntrusiveConnection(host))
	{
		sys_log(0, "Host %s is intrusive!", host);
		socket_close(desc);
		return NULL;
	}

First if checks if host doesn't reach the handshake limit and if it does - the host is dropped.
Second if seeks for intrusive peers.
That simply means if one tries to connect again, and again, and again in defined time lapse it probably turns out to be an attacking IP.
Let's jump for a moment to desc.cpp, Initialize and add this variable initialization there:

tt_creation_time = get_global_time();

Yet another problem solved.

Still tho, all this code is considered to work more for authentication than for game's core.
Why is so?
Imagine a person how is not suppose to have any attention at all - a movie star for example.
Usually when one walks on red carpet there are bunch of body guards sealing him/her out from the crowd around.
Same should happen to game cores because why one would try to perform a handshake with game if hadn't even logged in?
So that's we are going to do, simply whitelist players who were succeeded to perform through the login process and obtained login key.
Firstly let's jump back to the desc_manager.cpp and add this little code above our previous alterations:

	// If it's not an auth server - check for validation first
	if (!g_bAuthServer)
	{
		if (!IsOnHandshakeWhitelist(peer))
		{
			// sys_log(0, "Host %s has not validated through login!", host);
			socket_close(desc);
			return NULL;
		}
	}

Now open input_db.cpp, move to AuthLogin function and add at the end:

	// Validating handshake
	TPacketGGHandshakeValidate pack;
	pack.header = HEADER_GG_HANDSHAKE_VALIDATION;
	strlcpy(pack.sUserIP, d->GetHostName(), sizeof(pack.sUserIP));
	P2P_MANAGER::instance().Send(&pack, sizeof(pack));

And so on repeat it for AuthLoginOpenID if you use it.
Now let's jump to input_p2p.cpp, move to Analyze function and after the initial syslog:

	// Auth server is not allowed for p2p
	if (g_bAuthServer)
	{
		// Clearing buffers for dynamic packets
		switch (bHeader)
		{
			case HEADER_GG_RELAY:
			{
				TPacketGGRelay * p = (TPacketGGRelay *) c_pData;
				if (m_iBufferLeft < sizeof(TPacketGGRelay) + p->lSize)
					iExtraLen = -1;
				else
					iExtraLen = p->lSize;
			}
			break;
			case HEADER_GG_NOTICE:
			{
				TPacketGGNotice * p = (TPacketGGNotice *) c_pData;
				if (m_iBufferLeft < sizeof(TPacketGGNotice) + p->lSize)
					iExtraLen = -1;
				else
					iExtraLen = p->lSize;
			}
			break;
			case HEADER_GG_GUILD:
			{
				iExtraLen = m_iBufferLeft - sizeof(TPacketGGGuild);
			}
			break;
			case HEADER_GG_MONARCH_NOTICE:
			{
				TPacketGGMonarchNotice * p = (TPacketGGMonarchNotice *) c_pData;
				if (m_iBufferLeft < p->lSize + sizeof(TPacketGGMonarchNotice))
					iExtraLen = -1;
				else
					iExtraLen = p->lSize;
			}
			break;
		}

		return iExtraLen;
	}

Since some of the packets might be dynamic, we need to ensure that data they hold is cleared properly.
If you have more dynamic packets binded - add them as above.

Move to db.cpp, find function SendLoginPing and replace with following:

void DBManager::SendLoginPing(const char * c_pszLogin)
{
	/*
	TPacketGGLoginPing ptog;

	ptog.bHeader = HEADER_GG_LOGIN_PING;
	strlcpy(ptog.szLogin, c_pszLogin, sizeof(ptog.szLogin));

	if (!g_pkAuthMasterDesc)  // If I am master, broadcast to others
	{
		P2P_MANAGER::instance().Send(&ptog, sizeof(TPacketGGLoginPing));
	}
	else // If I am slave send login ping to master
	{
		g_pkAuthMasterDesc->Packet(&ptog, sizeof(TPacketGGLoginPing));
	}
	*/
}

Avoiding clearing billing like that (wtf is that btw, shouldn't be executed at all).
Now move to packet_info.cpp and add this code in constructor:

Set(HEADER_GG_HANDSHAKE_VALIDATION,		sizeof(TPacketGGHandshakeValidate),	"HandShakeValidation",		false);

Jump back to input_p2p.cpp and add this case in Analyze function:

		case HEADER_GG_HANDSHAKE_VALIDATION:
			DESC_MANAGER::instance().AddToHandshakeWhiteList((const TPacketGGHandshakeValidate *) c_pData);
			break;

Finally jump to ClientManager.cpp in DB. Find function QUERY_SETUP and if condition with bAuthServer and add there following code:

peer->SetChannel(1);

Sine P2P communication is allowed only for peers possessing any channel number greater than zero, we set it.
Usually this practice should be forbidden but since we restrain the traffic for auth server (with code above) it should be safe.

Beware that this might cause first login failed because of packet propagation that can reach the cores after player connects.
Voilà, were mostly done with coding!

Last but no least, we need to take a brief introduction into kqueue and tedious tour between sockets and kernel vars.
Starting with kqueue. I would try to explain this but you better jump to this link. Freebsd documentation always appreciated.
Since Metin2 implementation of kqueue wrapper has its size limit you may try to increase it a bit and seek for a feedback.
If'd like to do so jump to main.cpp, start function and edit this variable:

main_fdw = fdwatch_new(VALUE);

Yet keep in mind! Do not try to over-optimize it! Try to experiment, put the different values. If you somehow screw it up it might drag you into the checkpoint issues and eventually crash the whole app.
So now a few words about sockets and how the listening process works.
When each connection aiming to appropriate port is detected it is dropped into the queue where it's waiting for application to pick it up.
So simply we can consider this as a waiting queue like in a grocery store.
The point is that this queue has it's limit and once the limit is reached, any new connection is declined at sight.
The listening limit for Metin2 core is wrapped into variable called SOMAXCONN.
If you dive into C socket documentation you can find something like this:

/*
 *Maximum queue length specifiable by listen
/*
#define SOMAXCONN        10

As for me it was 128.
Since it's a define the value is simply embedded into the app and you cannot manipulate it once a binary is built.
So let's change it to let more connection be scheduled.
You may ask, why?
If player tries to log in it does connect the channel port.
If the channel is unavailable you see fadeout and connection is terminated.
It happens because there is no place in the queue thus connection is scheduled at all.
But be careful! Do not set this value into some high-peak numbers!
Be aware that our io_loop function need to iterate through these all events and manage to handle this during the heartbeat session.
If you try to over optimize this value you can end up causing lags on the server, internal packets delays and more.
In case you'd ask me, value around 1024 is acceptable but still it's better if you take some lecture and experiment a bit.
And one more thing, don't forget to set this kernel option on the machine where your server runs:

sysctl kern.ipc.soacceptqueue=1024
sysctl kern.ipc.somaxconn=1024

So we are done! Don't forget to add the code from my github repo!

Epilogue
Metin2's quite an old app and we should not forget about that.
The netcode is old, rubbish and cumbersome thus this issue might be only one of many we haven't found just yet.
Keep in mind tho that even that mitigation won't protect your server.
Actually I doubt that even rewriting the code into more modern shape would do that if you don't get yourselves a good protection.
Protection, filters, external firewalls are actually the key especially now when stressers and all this stuff are back and harmful again.
Hope that this little thread will help you in your future development.

Extra

I manage to write a little collector for getting rid of handshakes that never completed this process (outdated).
If you'd like to switch it on jump to desc_manager.cpp constructor and add there:

	CEventFunctionHandler::instance().AddEvent([this](SArgumentSupportImpl *) {
		desc_manager_garbage_collector_info* info = AllocEventInfo<desc_manager_garbage_collector_info>();
		m_pkDescManagerGarbageCollector = event_create(desc_manager_garbage_collector_event, info, PASSES_PER_SEC(1));
	}, "DESC_MANAGER_COLLECTOR", 1);

Beware that you need this feature:

And don't forget to add this to destructor:

event_cancel(&m_pkDescManagerGarbageCollector);

Regards

Btw, credits for @Flourine for flooding my dev server with 20k packets per sec (asked for 2 btw). That helped me to analyze the problem.

August 7, 2019

% is a format specifier.

Take a look at this:

http://docwiki.embarcadero.com/RADStudio/Rio/en/Format_Specifiers_in_C/C%2B%2B

August 7, 2019

The error lies here:

                    if chr.IsNPC(dstChrID):
                    if app.ENABLE_REFINE_RENEWAL:

Either use "or" or add tabs into code below.

August 7, 2019

What's the difference between server "A" and server "B"?

July 4, 2019

12 minut temu, masodikbela napisał:

Not sure if you guys noticed, but its not only private servers. Its all type of content related to "Metin2". You cant upload video even from official servers if you are not an approved youtuber by GF. https://corporate.gameforge.com/games/letsplay/?lang=en

I just simply don't understand the logic behind it...

June 29, 2019

I don't think it's gonna fix anything but you can try..

June 29, 2019

Dnia 28.06.2019 o 12:04, Cripplez napisał:
Thank you very much, it is great and works

There is just a small problem, the race warrior can use all the items, the race sura can use 3 items, assassin 2 items and shaman 1 items

I think the problem is here
			local id_to_job = {[52101] = 0, [52102] = 1, [52103] = 2, [52104] = 3}
			if pc.get_job() > id_to_job[item.get_vnum()] then
				say_title (item.get_name()) -- just print book's name
				say("")
				say("This book isn't for your race")
				say("")
				return
			end
I should change it to this
			if pc.get_job() ~= id_to_job[item.get_vnum()] then

Yes you are right. It was already late I didn't realize that mistake

June 27, 2019

The problem is probably located somewhere else.
Send me a DM. I will try to help you with what you have, otherwise we might switch into paid service if we don't figure that out.

June 27, 2019

Try this one:

quest skill_book begin
    state start begin
		function GetSkillList2(min_level)
			local skill_list = special.active_skill_list[pc.get_job()+1][pc.get_skill_group()]
			local vnum_list = {}
			local name_list = {}
			for i = 1,table.getn(skill_list) do
				local skill_vnum = skill_list[i]
				local skill_level = pc.get_skill_level(skill_vnum)
				if skill_level >= min_level and skill_level < 30 then
					table.insert(vnum_list, skill_list[i])
					table.insert(name_list, locale.GM_SKILL_NAME_DICT[skill_vnum])
				end
			end
			return vnum_list, name_list
		end
		when 52101.use or 52102.use or 52103.use or 52104.use with pc.can_warp() begin -- don't forget about checking warp status, otherwise one's can bug is using f.e trade glitch
			local id_to_job = {[52101] = 0, [52102] = 1, [52103] = 2, [52104] = 3}
			if pc.get_job() > id_to_job[item.get_vnum()] then
				say_title (item.get_name()) -- just print book's name
				say("")
				say("This book isn't for your race")
				say("")
				return
			end	
			if pc.get_skill_group() == 0 then
				say_title (item.get_name()) -- just print book's name
				say("")
				say("you have no class yet")
				say("")
				return
			end		
			local vnum_list, name_list = skill_book.GetSkillList2(20)
			if table.getn(vnum_list) == 0 then
				say_title (item.get_name()) -- just print book's name
				say("")
				say_reward ("No skill to upgrade")
				say("")
				return
			end
			say_title (item.get_name()) -- just print book's name
			say("")
			say("choose the skill to upgrade:")
			say("")
			table.insert(name_list, "Annulla") 
			local s = select_table(name_list)
			if s == table.getn(name_list) then
				return
			end		
			local skill_name = name_list[s]
			local skill_vnum = vnum_list[s]
			local skill_level = pc.get_skill_level(skill_vnum)
			say_title (item.get_name()) -- just print book's name
			say("")
			say("you choose: "..skill_name)
			say("are you sure to upgrade this?")
			local a = select("yes","No")
			if a == 2 then
				return
			end
			pc.set_skill_level (skill_vnum, skill_level+1)
			pc.remove_item(item.get_vnum(), 1)
		end
	end
end

June 18, 2019

Change optimization level in compilation settings.

June 16, 2019

Doesn't make a big difference tbh. If you want to switch into VS 2019 you can do this with no hesitation - just keep using current toolset.

Here you have manual from xampp site:

https://docs.moodle.org/37/en/Windows_installation_using_XAMPP

There is no much of changes to be made in conf.txt.

June 14, 2019

So that might be some exception call.

You can try using ID'a or any kind of other RE tool to manage the execution flow in order to find out what cause this issue.

June 13, 2019

But is there any freeze with "Program has stopped working" message?

June 13, 2019

Test it in Debug mode - it's gives you possibility to run JIT and point the crash reason.

June 11, 2019

8 godzin temu, ElRenardo napisał:

Hi, thanks again !

Alright, so, I tried to install valgrind at first because of the easier usage and encountered this error at the start of the game processes with valgrind:


valgrind: I failed to allocate space for the application's stack.
valgrind: This may be the result of a very large --main-stacksize=
valgrind: setting.  Cannot continue.  Sorry.

I then tried to give the --main-stacksize argument with different values and it still gives me back this error.

Maybe some of you have a solution ?


# pkg info valgrind
valgrind-3.10.1.20160113_7,1
Name           : valgrind
Version        : 3.10.1.20160113_7,1
Installed on   : Tue Jun 11 08:41:47 2019 CEST
Origin         : devel/valgrind
Architecture   : FreeBSD:11:amd64
Prefix         : /usr/local
Categories     : devel
Licenses       : GPLv2
Maintainer     : [email protected]
WWW            : https://bitbucket.org/stass/valgrind-freebsd/overview
Comment        : Memory debugging and profiling tool
Options        :
        32BIT          : on
        DOCS           : on
        MANPAGES       : on
        MPI            : off
Annotations    :
        FreeBSD_version: 1102000
        repo_type      : binary
        repository     : FreeBSD

My system:


11.2-RELEASE-p9 FreeBSD 11.2-RELEASE-p9 #0: Tue Feb  5 15:30:36 UTC 2019     [email protected]:/usr/obj/usr/src/sys/GENERIC  amd64

Now I'm going to try with ASAN.

How much RAM have you got on your vps?

June 10, 2019

15 godzin temu, ElRenardo napisał:
Hi Sherer, thank you very much for your detailed answer.

I'm on O2 flag by default, I remember changing that already but didn't get much better results in the diagnosis so I put it back to normal.

As it's a test server now, I'll try to keep it to default O0 for now on.

Here are the compiler flags I use then:
-m32 -g -Wall -O -pipe -fexceptions -std=gnu++17 -fno-strict-aliasing -pthread -D_THREAD_SAFE -DNDEBUG -fstack-protector-all
The second error, with tr1 lists was where the crashes first started.

I then did change all the tr1 lists in the sources to std lists, and did the same for every boost lists.

I then got erreor with affects, I remember removing the boost affect_pool in affect.cpp, to make it use the default M2 allocator as if DEBUG_ALLOC was declared in this file.

For the third error, I remember having some troubles with the memory usage of the server back then.

I'm not sure if it's exactly at that time, but the memory usage of my machine was nearly at 100% even with the game not started.

The machine had not been restarted for years. Until now after the restart the memory usage is fine.

At this time, I though that the memory corruption could be because of a memory problem on my server, so I rent a new machine but still got crashes on it.

But, the error seems to be part of the first one, that I got yesterday after 35 days without reboot and crashes (and very very few players on it so it's not really showing that it crashes less than before).

I did an update, so I restarted the game, let some 3-5 players try it and got a crash after a few hours while a player attempted the refinement of ores on a guild alchemist.

It worked 3 times, and then it crashed giving a backtrace to luaM_realloc because it couldn't allocate memory.

For the checkpointing, everything is as it is by default, I haven't touched anything about that.

To be sure, I'll only be able to check tomorrow.

Thanks for the proposal, I'll keep that in mind.

I don't think pooling is used by default (DEBUG_ALLOC should be disabled in release mode). If you keep std instead of TR1 is good tho.
If there wasn't any crash throughout those 35 days where there was no player on your server that probably means that error is linked some player-depended stuff.

@masodikbela has came up with right idea. You can try to perform some memory leak diagnose using ASAN or valgrind (depeneds on you):

https://github.com/google/sanitizers/wiki/AddressSanitizer
http://www.valgrind.org/docs/manual/quick-start.html

On the other hand you can merge your source into windows and use visual studio's built-in profiler:

https://docs.microsoft.com/en-us/visualstudio/profiling/memory-usage?view=vs-2019

June 9, 2019

Hello,
I don't really know if someone else pointed it out but (if not) there you go.
There is really ugly yang-bug in guild building code.
Open cmd_gm.cpp, go to the do_build function and navigate this kind of code:

				if (test_server || GMLevel == GM_PLAYER)
					// °ÇĽł Ŕç·á ĽŇ¸đÇĎ±â (Ĺ×Ľ·żˇĽ´Â GMµµ ĽŇ¸đ)
				{
					// °ÇĽł şńżë ĽŇ¸đ
					ch->PointChange(POINT_GOLD, -t->dwPrice);

Looks ok, right? Not really. dwPrice is typed as DWORD. It's never a good idea to subtract unsigned value.
That will not cause any damage if your PointChange function takes int as an argument but once you decide to change it to f.e long long, there you have live example:
https://onlinegdb.com/HJF-CWsCE
Mitigation:

Just cast the value to int/long long:

				if (test_server || GMLevel == GM_PLAYER)
					// °ÇĽł Ŕç·á ĽŇ¸đÇĎ±â (Ĺ×Ľ·żˇĽ´Â GMµµ ĽŇ¸đ)
				{
					// °ÇĽł şńżë ĽŇ¸đ
					int iPrice = static_cast<int>(t->dwPrice);
					ch->PointChange(POINT_GOLD, -iPrice);

Regards

June 9, 2019

First of all there might more than one reason why your game crashes.

Your cores gives 3 outcomes where:

1. Malfunction from lua level. If you want to put this under deeper diagnosis consider changing optimization flag:
https://docs.oracle.com/cd/E37670_01/E52461/html/ch04s03.html

Then gdb should give you more details.

2. Don't use TR1 - it's gonna be deprecated soon (as I know) and since C++11 standard has been released it's pointless to use TR's features (Technical Report was a bridge between C++03 and C++11). Update your gcc, switch from TR1 to stl. That will probably solve this error (worked for me).

3. The third error is a bit tricky and might be tough to figure. Your core shows that there was not enough memory to be allocated thus that strange abort. I would not consider that part as an error-prone - that was probably random code area where system ran out of memory. You should look out for a leak somewhere else.

And additional question. Did you disable checkpointing? Is the answer is 'yes' switch it back immediately.
If you won't be able to solve those crashes and will be really eager to get it done - send me PM. Keep in mind tho that if there is gonna be a lot of diagnosis coming out I won't do for free.
Good luck

June 6, 2019

FreeBSD 9.3 is no more supported thus you can't use pkg utility anymore (probably the url is deprecated).
Consider updating your OS to current RELEASE (11.2 or 12.0):

https://www.freebsd.org/doc/handbook/updating-upgrading-freebsdupdate.html

On the other hand this might be caused by some network problem. Do some diagnose and you should find the source of issue.

June 5, 2019

1 minutę temu, Ezequiel G. napisał:

Nice, thanks !

FreeBSD 12 is ready to compile source?

Sure. You can use either release (11.2 or 12.0) it wouldn't make a big difference anyway (at least if you are not up to some major changes in code).

June 5, 2019

Check this part:

           # Sash refine effect
            if app.ENABLE_ACCE_SYSTEM:
                slotNumberChecked = 0
                if not constInfo.IS_AUTO_POTION(itemVnum):
                    self.wndItem.DeactivateSlot(i)

This if-clauser is meaningless (you are already checking for auto potion few lines above).
Get rid of it. That should do the trick for you.

June 5, 2019

Go to offline_shop.cpp, line 1010, find code probably like this:

ch->GetDesc()->...

and add if guard before:

if (ch->GetDesc())
    ch->GetDesc()->...

Btw include your code next time. None likes ghosthunting.

May 20, 2019

It's called ListBox.

May 18, 2019

If this problem persist only for couple types you shall jump into the AddItemData function and look out for item.BROKEN_TYPE if-statement.
It may be good point to start from.
If you still won't find any success, share your uitooltip.

May 17, 2019

Any core?

Sign In

Forums

Store

Third Party - Providers Directory

Feature Plan

Release Notes

Docs

Events

Posts posted by Sherer

Important Information