Jump to content

Sherer

Inactive Member
  • Posts

    112
  • Joined

  • Last visited

  • Days Won

    3
  • Feedback

    0%

Posts posted by Sherer

  1. M2 Download Center

    This is the hidden content, please
    ( Internal )

    Hello,

    Didn't know whether to place this subject either here or on guides but since I'm gonna not only share the mitigation but also express my conclusion about this issue and further after-effects.
    Only part of codes are attached below.
    For methods definitions + headers jump here:

    This is the hidden content, please

    1. Vulnerability overview.

    Any kind of tcp application is required to have a server (anda  client). Once a server is launched it's binded to appropriate socket and set on listening for further connections (in a cutoff, navigate here for more precise info)
    On the other side client is the one which is suppose to connect to server. It does it by connecting to server's socket and start to process a handshake.
    This is how it works in a big shortcut - or I should rather say, it does work if we deal with normal peer.
    In case of more-like modern apps the traffic is held by efficient algorithms that can carry even heavy traffic or eventual feeble attack.
    But since Metin2's core is not a state-of-art app and simply runs on old C style code (libthecore) without a real concurrency support that might dive us into some tricky issues.
    So let's imagine what happens if someone tries to pull off an attack and flood the server with enormous amount of packets.
    Since each of connection needs to be validated first, it goes through the handshake.

    Server catches it through the fdwatch function (jump to io_loop, main.cpp), then if no desc is presented moves it to AcceptDesc where the connection is validated.
    It is  usually allowed unless the peer's ip is not in a banlist. Then desc is created and connection is saved for further handshake response.
    Sounds reasonable, right?
    Now imagine when thousands connection are accepted, validated, and expected to send handshake response.
    Does it still sound so optimistic?
    Additionally, when a desc is created it does allocated buffer for I/O traffic. Each connection. Each time. Each desc.

    Now do the math and try to conceive how much of memory is simply wasted.
    So that's the point we've been heading.

    That's our vulnerability.
    And btw, if so many agents are in the queue, do you think anybody will be able to connect your server?

    2. Four major problems.

    Let's start from handshake itself.
    Imagine that someone approaches you and gives you his hand to shake it.
    And than second.
    And then third.
    Doesn't make sense, does it?
    Same applies to handshake process. Simply, only one handshake shall be ongoing for the host unless it completes it.
    So, let's jump to AcceptDesc in desc_manager.cpp and add this little code above:
     

    newd = M2_NEW DESC;
    	// Let's check if a handshake from this host is already ongoing
    	if (GetHostHandshake(peer))
    	{
    		sys_log(0, "Handshake from %s is not permitted!", host);
    		socket_close(desc);
    		return NULL;
    	}

    So let's consider this as solved for now.

    Then the second issue.
    Let's imagine yet another event.
    Someone shakes your hand but this time completes the handshake. But he does it again. And again. And again.
    Sounds exhausting?
    Let's add this 2 conditions below our recent code from above:

    	static const int HOST_CONNECTION_LIMIT = 3;
    	// In case if host completed handshake process let's check if it doesn't reach the limit
    	if (GetHostConnectionCount(peer) >= HOST_CONNECTION_LIMIT)
    	{
    		sys_log(0, "Host %s connection limit has been reached!", host);
    		socket_close(desc);
    		return NULL;
    	}
    
    	// And block intrusive connections as well
    	if (IsIntrusiveConnection(host))
    	{
    		sys_log(0, "Host %s is intrusive!", host);
    		socket_close(desc);
    		return NULL;
    	}

    First if checks if host doesn't reach the handshake limit and if it does - the host is dropped.
    Second if seeks for intrusive peers.
    That simply means if one tries to connect again, and again, and again in defined time lapse it probably turns out to be an attacking IP.
    Let's jump for a moment to desc.cpp, Initialize and add this variable initialization there:

    tt_creation_time = get_global_time();

    Yet another problem solved.


    Still tho, all this code is considered to work more for authentication than for game's core.
    Why is so?
    Imagine a person how is not suppose to have any attention at all - a movie star for example.
    Usually when one walks on red carpet there are bunch of body guards sealing him/her out from the crowd around.
    Same should happen to game cores because why one would try to perform a handshake with game if hadn't even logged in?
    So that's we are going to do, simply whitelist players who were succeeded to perform through the login process and obtained login key.
    Firstly let's jump back to the desc_manager.cpp and add this little code above our previous alterations:

    	// If it's not an auth server - check for validation first
    	if (!g_bAuthServer)
    	{
    		if (!IsOnHandshakeWhitelist(peer))
    		{
    			// sys_log(0, "Host %s has not validated through login!", host);
    			socket_close(desc);
    			return NULL;
    		}
    	}

    Now open input_db.cpp, move to AuthLogin function and add at the end:

    	// Validating handshake
    	TPacketGGHandshakeValidate pack;
    	pack.header = HEADER_GG_HANDSHAKE_VALIDATION;
    	strlcpy(pack.sUserIP, d->GetHostName(), sizeof(pack.sUserIP));
    	P2P_MANAGER::instance().Send(&pack, sizeof(pack));

    And so on repeat it for AuthLoginOpenID if you use it.
    Now let's jump to input_p2p.cpp, move to Analyze function and after the initial syslog:

    	// Auth server is not allowed for p2p
    	if (g_bAuthServer)
    	{
    		// Clearing buffers for dynamic packets
    		switch (bHeader)
    		{
    			case HEADER_GG_RELAY:
    			{
    				TPacketGGRelay * p = (TPacketGGRelay *) c_pData;
    				if (m_iBufferLeft < sizeof(TPacketGGRelay) + p->lSize)
    					iExtraLen = -1;
    				else
    					iExtraLen = p->lSize;
    			}
    			break;
    			case HEADER_GG_NOTICE:
    			{
    				TPacketGGNotice * p = (TPacketGGNotice *) c_pData;
    				if (m_iBufferLeft < sizeof(TPacketGGNotice) + p->lSize)
    					iExtraLen = -1;
    				else
    					iExtraLen = p->lSize;
    			}
    			break;
    			case HEADER_GG_GUILD:
    			{
    				iExtraLen = m_iBufferLeft - sizeof(TPacketGGGuild);
    			}
    			break;
    			case HEADER_GG_MONARCH_NOTICE:
    			{
    				TPacketGGMonarchNotice * p = (TPacketGGMonarchNotice *) c_pData;
    				if (m_iBufferLeft < p->lSize + sizeof(TPacketGGMonarchNotice))
    					iExtraLen = -1;
    				else
    					iExtraLen = p->lSize;
    			}
    			break;
    		}
    
    		return iExtraLen;
    	}

    Since some of the packets might be dynamic, we need to ensure that data they hold is cleared properly.
    If you have more dynamic packets binded - add them as above.

    Move to db.cpp, find function SendLoginPing and replace with following:

    void DBManager::SendLoginPing(const char * c_pszLogin)
    {
    	/*
    	TPacketGGLoginPing ptog;
    
    	ptog.bHeader = HEADER_GG_LOGIN_PING;
    	strlcpy(ptog.szLogin, c_pszLogin, sizeof(ptog.szLogin));
    
    	if (!g_pkAuthMasterDesc)  // If I am master, broadcast to others
    	{
    		P2P_MANAGER::instance().Send(&ptog, sizeof(TPacketGGLoginPing));
    	}
    	else // If I am slave send login ping to master
    	{
    		g_pkAuthMasterDesc->Packet(&ptog, sizeof(TPacketGGLoginPing));
    	}
    	*/
    }

    Avoiding clearing billing like that (wtf is that btw, shouldn't be executed at all).
    Now move to packet_info.cpp and add this code in constructor:

    Set(HEADER_GG_HANDSHAKE_VALIDATION,		sizeof(TPacketGGHandshakeValidate),	"HandShakeValidation",		false);

    Jump back to input_p2p.cpp and add this case in Analyze function:
     

    		case HEADER_GG_HANDSHAKE_VALIDATION:
    			DESC_MANAGER::instance().AddToHandshakeWhiteList((const TPacketGGHandshakeValidate *) c_pData);
    			break;

    Finally jump to ClientManager.cpp in DB. Find function QUERY_SETUP and if condition with bAuthServer and add there following code:

    peer->SetChannel(1);

    Sine P2P communication is allowed only for peers possessing any channel number greater than zero, we set it.
    Usually this practice should be forbidden but since we restrain the traffic for auth server (with code above) it should be safe.

    Beware that this might cause first login failed because of packet propagation that can reach the cores after player connects.
    Voilà, were mostly done with coding!

    Last but no least, we need to take a brief introduction into kqueue and tedious tour between sockets and kernel vars.
    Starting with kqueue. I would try to explain this but you better jump to this link. Freebsd documentation always appreciated.
    Since Metin2 implementation of kqueue wrapper has its size limit you may try to increase it a bit and seek for a feedback.
    If'd like to do so jump to main.cpp, start function and edit this variable:
     

    main_fdw = fdwatch_new(VALUE);

    Yet keep in mind! Do not try to over-optimize it! Try to experiment, put the different values. If you somehow screw it up it might drag you into the checkpoint issues and eventually crash the whole app.
    So now a few words about sockets and how the listening process works.
    When each connection aiming to appropriate port is detected it is dropped into the queue where it's waiting for application to pick it up.
    So simply we can consider this as a waiting queue like in a grocery store.
    The point is that this queue has it's limit and once the limit is reached, any new connection is declined at sight.
    The listening limit for Metin2 core is wrapped into variable called SOMAXCONN.
    If you dive into C socket documentation you can find something like this:

    /*
     *Maximum queue length specifiable by listen
    /*
    #define SOMAXCONN        10

    As for me it was 128.
    Since it's a define the value is simply embedded into the app and you cannot manipulate it once a binary is built.
    So let's change it to let more connection be scheduled.
    You may ask, why?
    If player tries to log in it does connect the channel port.
    If the channel is unavailable you see fadeout and connection is terminated.
    It happens because there is no place in the queue thus connection is scheduled at all.
    But be careful! Do not set this value into some high-peak numbers!
    Be aware that our io_loop function need to iterate through these all events and manage to handle this during the heartbeat session.

    If you try to over optimize this value you can end up causing lags on the server, internal packets delays and more.
    In case you'd ask me, value around 1024 is acceptable but still it's better if you take some lecture and experiment a bit.
    And one more thing, don't forget to set this kernel option on the machine where your server runs:

    sysctl kern.ipc.soacceptqueue=1024
    sysctl kern.ipc.somaxconn=1024

    So we are done! Don't forget to add the code from my github repo!

    Epilogue
    Metin2's quite an old app and we should not forget about that.
    The netcode is old, rubbish and cumbersome thus this issue might be only one of many we haven't found just yet.
    Keep in mind tho that even that mitigation won't protect your server.
    Actually I doubt that even rewriting the code into more modern shape would do that if you don't get yourselves a good protection.
    Protection, filters, external firewalls are actually the key especially now when stressers and all this stuff are back and harmful again.
    Hope that this little thread will help you in your future development.

    Extra

    I manage to write a little collector for getting rid of handshakes that never completed this process (outdated).
    If you'd like to switch it on jump to desc_manager.cpp constructor and add there:

    	CEventFunctionHandler::instance().AddEvent([this](SArgumentSupportImpl *) {
    		desc_manager_garbage_collector_info* info = AllocEventInfo<desc_manager_garbage_collector_info>();
    		m_pkDescManagerGarbageCollector = event_create(desc_manager_garbage_collector_event, info, PASSES_PER_SEC(1));
    	}, "DESC_MANAGER_COLLECTOR", 1);

    Beware that you need this feature:

    And don't forget to add this to destructor:

    event_cancel(&m_pkDescManagerGarbageCollector);


    Regards

    Btw, credits for @Flourine for flooding my dev server with 20k packets per sec (asked for 2 btw). That helped me to analyze the problem.

    • Metin2 Dev 92
    • Eyes 1
    • Angry 1
    • Not Good 1
    • Sad 1
    • Think 1
    • Scream 1
    • Good 44
    • Love 6
    • Love 111
  2. Dnia 28.06.2019 o 12:04, Cripplez napisał:

    Thank you very much, it is great and works :)

    There is just a small problem, the race warrior can use all the items, the race sura can use 3 items, assassin 2 items and shaman 1 items

    I think the problem is here

    
    			local id_to_job = {[52101] = 0, [52102] = 1, [52103] = 2, [52104] = 3}
    			if pc.get_job() > id_to_job[item.get_vnum()] then
    				say_title (item.get_name()) -- just print book's name
    				say("")
    				say("This book isn't for your race")
    				say("")
    				return
    			end

     

    I should change it to this

    
    			if pc.get_job() ~= id_to_job[item.get_vnum()] then

     

    Yes you are right. It was already late I didn't realize that mistake :P

    • Love 1
  3. Try this one:

    quest skill_book begin
        state start begin
    		function GetSkillList2(min_level)
    			local skill_list = special.active_skill_list[pc.get_job()+1][pc.get_skill_group()]
    			local vnum_list = {}
    			local name_list = {}
    			for i = 1,table.getn(skill_list) do
    				local skill_vnum = skill_list[i]
    				local skill_level = pc.get_skill_level(skill_vnum)
    				if skill_level >= min_level and skill_level < 30 then
    					table.insert(vnum_list, skill_list[i])
    					table.insert(name_list, locale.GM_SKILL_NAME_DICT[skill_vnum])
    				end
    			end
    			return vnum_list, name_list
    		end
    		when 52101.use or 52102.use or 52103.use or 52104.use with pc.can_warp() begin -- don't forget about checking warp status, otherwise one's can bug is using f.e trade glitch
    			local id_to_job = {[52101] = 0, [52102] = 1, [52103] = 2, [52104] = 3}
    			if pc.get_job() > id_to_job[item.get_vnum()] then
    				say_title (item.get_name()) -- just print book's name
    				say("")
    				say("This book isn't for your race")
    				say("")
    				return
    			end	
    			if pc.get_skill_group() == 0 then
    				say_title (item.get_name()) -- just print book's name
    				say("")
    				say("you have no class yet")
    				say("")
    				return
    			end		
    			local vnum_list, name_list = skill_book.GetSkillList2(20)
    			if table.getn(vnum_list) == 0 then
    				say_title (item.get_name()) -- just print book's name
    				say("")
    				say_reward ("No skill to upgrade")
    				say("")
    				return
    			end
    			say_title (item.get_name()) -- just print book's name
    			say("")
    			say("choose the skill to upgrade:")
    			say("")
    			table.insert(name_list, "Annulla") 
    			local s = select_table(name_list)
    			if s == table.getn(name_list) then
    				return
    			end		
    			local skill_name = name_list[s]
    			local skill_vnum = vnum_list[s]
    			local skill_level = pc.get_skill_level(skill_vnum)
    			say_title (item.get_name()) -- just print book's name
    			say("")
    			say("you choose: "..skill_name)
    			say("are you sure to upgrade this?")
    			local a = select("yes","No")
    			if a == 2 then
    				return
    			end
    			pc.set_skill_level (skill_vnum, skill_level+1)
    			pc.remove_item(item.get_vnum(), 1)
    		end
    	end
    end

     

    • Love 1
  4. 8 godzin temu, ElRenardo napisał:

    Hi, thanks again !

     

    Alright, so, I tried to install valgrind at first because of the easier usage and encountered this error at the start of the game processes with valgrind:

    
    valgrind: I failed to allocate space for the application's stack.
    valgrind: This may be the result of a very large --main-stacksize=
    valgrind: setting.  Cannot continue.  Sorry.
    

    I then tried to give the --main-stacksize argument with different values and it still gives me back this error.

    Maybe some of you have a solution ?

    
    # pkg info valgrind
    valgrind-3.10.1.20160113_7,1
    Name           : valgrind
    Version        : 3.10.1.20160113_7,1
    Installed on   : Tue Jun 11 08:41:47 2019 CEST
    Origin         : devel/valgrind
    Architecture   : FreeBSD:11:amd64
    Prefix         : /usr/local
    Categories     : devel
    Licenses       : GPLv2
    Maintainer     : [email protected]
    WWW            : https://bitbucket.org/stass/valgrind-freebsd/overview
    Comment        : Memory debugging and profiling tool
    Options        :
            32BIT          : on
            DOCS           : on
            MANPAGES       : on
            MPI            : off
    Annotations    :
            FreeBSD_version: 1102000
            repo_type      : binary
            repository     : FreeBSD

    My system:

    
    11.2-RELEASE-p9 FreeBSD 11.2-RELEASE-p9 #0: Tue Feb  5 15:30:36 UTC 2019     [email protected]:/usr/obj/usr/src/sys/GENERIC  amd64

     

    Now I'm going to try with ASAN.

    How much RAM have you got on your vps?

  5. 15 godzin temu, ElRenardo napisał:

    Hi Sherer, thank you very much for your detailed answer.

     

    I'm on O2 flag by default, I remember changing that already but didn't get much better results in the diagnosis so I put it back to normal.

    As it's a test server now, I'll try to keep it to default O0 for now on.

    Here are the compiler flags I use then:

    
    -m32 -g -Wall -O -pipe -fexceptions -std=gnu++17 -fno-strict-aliasing -pthread -D_THREAD_SAFE -DNDEBUG -fstack-protector-all

     

    The second error, with tr1 lists was where the crashes first started.

    I then did change all the tr1 lists in the sources to std lists, and did the same for every boost lists.

    I then got erreor with affects, I remember removing the boost affect_pool in affect.cpp, to make it use the default M2 allocator as if DEBUG_ALLOC was declared in this file.

     

     

    For the third error, I remember having some troubles with the memory usage of the server back then.

    I'm not sure if it's exactly at that time, but the memory usage of my machine was nearly at 100% even with the game not started.

    The machine had not been restarted for years. Until now after the restart the memory usage is fine.

    At this time, I though that the memory corruption could be because of a memory problem on my server, so I rent a new machine but still got crashes on it.

     

    But, the error seems to be part of the first one, that I got yesterday after 35 days without reboot and crashes (and very very few players on it so it's not really showing that it crashes less than before).

    I did an update, so I restarted the game, let some 3-5 players try it and got a crash after a few hours while a player attempted the refinement of ores on a guild alchemist.

    It worked 3 times, and then it crashed giving a backtrace to luaM_realloc because it couldn't allocate memory.

     

     

    For the checkpointing, everything is as it is by default, I haven't touched anything about that.

    To be sure, I'll only be able to check tomorrow.

     

    Thanks for the proposal, I'll keep that in mind.

    I don't think pooling is used by default (DEBUG_ALLOC should be disabled in release mode). If you keep std instead of TR1 is good tho.
    If there wasn't any crash throughout those 35 days where there was no player on your server that probably means that error is linked some player-depended stuff.

    @masodikbela has came up with right idea. You can try to perform some memory leak diagnose using ASAN or valgrind (depeneds on you):

    https://github.com/google/sanitizers/wiki/AddressSanitizer
    http://www.valgrind.org/docs/manual/quick-start.html

    On the other hand you can merge your source into windows and use visual studio's built-in profiler:
     

    https://docs.microsoft.com/en-us/visualstudio/profiling/memory-usage?view=vs-2019

     

    • Love 1
  6. Hello,
    I don't really know if someone else pointed it out but (if not) there you go.
    There is really ugly yang-bug in guild building code.
    Open cmd_gm.cpp, go to the do_build function and navigate this kind of code:
     

    				if (test_server || GMLevel == GM_PLAYER)
    					// °ÇĽł Ŕç·á ĽŇ¸đÇϱâ (Ĺ׼·żˇĽ­´Â GMµµ ĽŇ¸đ)
    				{
    					// °ÇĽł şńżë ĽŇ¸đ
    					ch->PointChange(POINT_GOLD, -t->dwPrice);

    Looks ok, right? Not really. dwPrice is typed as DWORD. It's never a good idea to subtract unsigned value.
    That will not cause any damage if your PointChange function takes int as an argument but once you decide to change it to f.e long long, there you have live example:
    https://onlinegdb.com/HJF-CWsCE
    Mitigation:

    Just cast the value to int/long long:

    				if (test_server || GMLevel == GM_PLAYER)
    					// °ÇĽł Ŕç·á ĽŇ¸đÇϱâ (Ĺ׼·żˇĽ­´Â GMµµ ĽŇ¸đ)
    				{
    					// °ÇĽł şńżë ĽŇ¸đ
    					int iPrice = static_cast<int>(t->dwPrice);
    					ch->PointChange(POINT_GOLD, -iPrice);

    Regards

    • Love 5
  7. First of all there might more than one reason why your game crashes.

    Your cores gives 3 outcomes where:

    1. Malfunction from lua level. If you want to put this under deeper diagnosis consider changing optimization flag:
    https://docs.oracle.com/cd/E37670_01/E52461/html/ch04s03.html

    Then gdb should give you more details.

    2. Don't use TR1 - it's gonna be deprecated soon (as I know) and since C++11 standard has been released it's pointless to use TR's features (Technical Report was a bridge between C++03 and C++11). Update your gcc, switch from TR1 to stl. That will probably solve this error (worked for me).

    3. The third error is a bit tricky and might be tough to figure. Your core shows that there was not enough memory to be allocated thus that strange abort. I would not consider that part as an error-prone - that was probably random code area where system ran out of memory. You should look out for a leak somewhere else.

    And additional question. Did you disable checkpointing? Is the answer is 'yes' switch it back immediately.
    If you won't be able to solve those crashes and will be really eager to get it done - send me PM. Keep in mind tho that if there is gonna be a lot of diagnosis coming out I won't do for free.
    Good luck

    • Love 2
  8. FreeBSD 9.3 is no more supported thus you can't use pkg utility anymore (probably the url is deprecated).
    Consider updating your OS to current RELEASE (11.2 or 12.0):
     

    https://www.freebsd.org/doc/handbook/updating-upgrading-freebsdupdate.html

    On the other hand this might be caused by some network problem. Do some diagnose and you should find the source of issue.

  9. Check this part:

               # Sash refine effect
                if app.ENABLE_ACCE_SYSTEM:
                    slotNumberChecked = 0
                    if not constInfo.IS_AUTO_POTION(itemVnum):
                        self.wndItem.DeactivateSlot(i)

    This if-clauser is meaningless (you are already checking for auto potion few lines above).
    Get rid of it. That should do the trick for you.

    • Love 2
×
×
  • Create New...

Important Information

Terms of Use / Privacy Policy / Guidelines / We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.