Premium Abel(Tiger) 1347 Posted August 14 Premium Share Posted August 14 There's a duplication bug when improving the strenght of a dragon soul item. How to reproduce: teleport directly after your dragon soul improvement succeded (eg: succes from +3 to +4). (players can use dungeon warp out to trigger this) This bug CAN'T happen on a regular server if you don't have remote dragon soul refinement because you can't warp after the refinement. The logic behind it is this: 1. RemoveFromCharacter is called on the item. The item is set to null on the player inventory so the instance of the item is never destroyed (M2_DESTROY_ITEM). ClearItem() usually handles that. 2. SetCount is called on the item but the destroy is never called because the pointer to the owner is null because of RemoveFromCharacter 3. The save is delayed so if you warp fast enough after the refine (before the SaveSingleItem is called) the db will send that item back to your inventory Fix: // DragonSoul.cpp // Function bool DSManager::DoRefineStrength // Search and comment: pDragonSoul->RemoveFromCharacter(); 1 10 Link to comment Share on other sites More sharing options...
Premium Intel 804 Posted August 14 Premium Share Posted August 14 (edited) Aren't dragonsoul items usuallly not stackable? Dafuq is that SetCount for? (I don't use dragon soul and I would have suspected to be like the original refine, as in, DESTROY the refined item, decrease upgrade item, give result vnum) Good catch though Edited August 14 by Intel Link to comment Share on other sites More sharing options...
Premium masodikbela 1464 Posted August 14 Premium Share Posted August 14 (edited) Rarely can say this but an actual good catch! Besides the dupe it also causes memory leak as the item will never be destroyed. Very similar to the item move dupe. Might be also a good idea to put an extra log to SaveSingleItem when its executed on items without owner, as its already sus and I can't really think about a good scenario where you would intentionally like to save an item without an owner. Also not so sure about the "this can't happen on a regular server" part, as nothing is guarding against this besides the delay which you can easily bypass by closing the connection instantly and logging back in. Also the delayed save might be different on each server and if its longer (I think by default it is) than 10 seconds (the delay you probably have after such actions) this can easily happen. Like I said all depends on the timing. Edited August 14 by masodikbela 1 1 The one and only UI programming guideline Link to comment Share on other sites More sharing options...
Honorable Member martysama0134 7720 Posted August 14 Honorable Member Share Posted August 14 (edited) 8 hours ago, Abel(Tiger) said: This bug CAN'T happen on a regular server if you don't have remote dragon soul refinement because you can't warp after the refinement. You can bypass ch->CanWarp() by switching channels client-side though, so technically it can happen on regular servers as well. The best fix is inside CItem::SetCount by deleting non-owned items (with count 0 obviously) immediately, but you may have to solve some item ptr danglings afterwards. They often used the item ptr (like for logging) after using item->SetCount(0). Edited August 14 by martysama0134 2 Check out my GitHub Link to comment Share on other sites More sharing options...
Premium masodikbela 1464 Posted August 14 Premium Share Posted August 14 (edited) 31 minutes ago, martysama0134 said: The best fix is inside CItem::SetCount by deleting non-owned items immediately, but you may have to solve some item ptr danglings afterwards. They often used the item ptr (like for logging) after using item->SetCount(0). I would avoid deleting the item instantly in SetCount in every case as it would be a valid scenario to remove the item from the character for some reason, change the count (to something larger than 0) and then give it back to the character. Deleting it if its count reached zero however regardless if it has an owner or not could be a good idea and if care was not taken in functions that disrespects SetCount's return value (it returns false if the item has been deleted) then sooner or later these parts will reveal themselves in a form of a crash, and probably in most cases it is already happening regardless of the change (by default there are some parts with use-after-free cases for logging like you said). 1 hour ago, Intel said: Aren't dragonsoul items usuallly not stackable? Dafuq is that SetCount for? I use SetCount(GetCount()-1) in most cases even if I may know that the item is not stackable, its more future proof when someone wakes up with an idea to make that item stackable. Edited August 14 by masodikbela 1 The one and only UI programming guideline Link to comment Share on other sites More sharing options...
Honorable Member martysama0134 7720 Posted August 14 Honorable Member Share Posted August 14 39 minutes ago, masodikbela said: I would avoid deleting the item instantly in SetCount in every case as it would be a valid scenario to remove the item from the character for some reason, change the count (to something larger than 0) and then give it back to the character. Obviously the ones with count 0 2 Check out my GitHub Link to comment Share on other sites More sharing options...
Honorable Member Owsap 9385 Posted August 14 Honorable Member Share Posted August 14 (edited) First of all, good job and thanks for sharing the fix with everyone in the community. It's worth mentioning that this bug only occurs when you switch to another core. When I realized this before I actually ended up changing the item update function which saves all items with delayed save, it eventually fixed the issue. Sanity checks for ITEM_MANAGER::Update Spoiler void ITEM_MANAGER::Update() { std::unordered_set<LPITEM>::iterator it = m_set_pkItemForDelayedSave.begin(); std::unordered_set<LPITEM>::iterator this_it; while (it != m_set_pkItemForDelayedSave.end()) { this_it = it++; LPITEM item = *this_it; // NOTE : Check NULL item if (NULL == item) { sys_err("NULL item, erasing from delayed save."); m_set_pkItemForDelayedSave.erase(item); continue; } // NOTE : Check if this item still exists if (FindByVID(item->GetVID()) == NULL) { sys_err("Invalid item, erasing from delayed save."); m_set_pkItemForDelayedSave.erase(item); continue; } // SLOW_QUERY Ç÷¡±×°¡ ÀÖ´Â °ÍÀº Á¾·á½Ã¿¡¸¸ ÀúÀåÇÑ´Ù. if (item->GetOwner() && IS_SET(item->GetFlag(), ITEM_FLAG_SLOW_QUERY)) continue; SaveSingleItem(item); m_set_pkItemForDelayedSave.erase(this_it); } } This is also how I changed my DSManager::DoRefineStrength Spoiler /// @ DSManager::DoRefineStrength if (fnumber(0.f, 100.f) <= fProb) { pResult = ITEM_MANAGER::instance().CreateItem(MakeDragonSoulVnum(bType, bGrade, bStep, bStrength + 1)); if (NULL == pResult) { sys_err("INVALID DRAGON SOUL(%d)", MakeDragonSoulVnum(bType, bGrade, bStep, bStrength + 1)); return false; } char buf[128]; sprintf(buf, "STRENGTH : %d -> %d", bStrength, bStrength + 1); LogManager::instance().ItemLog(ch, pDragonSoul, "DS_STRENGTH_REFINE_SUCCESS", buf); pDragonSoul->CopyAttributeTo(pResult); RefreshItemAttributes(pResult); ITEM_MANAGER::instance().RemoveItem(pDragonSoul, "REMOVE (DS_STRENGTH_REFINE_SUCCESS)"); pRefineStone->SetCount(pRefineStone->GetCount() - 1); ch->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("°È¿¡ ¼º°øÇß½À´Ï´Ù.")); ch->AutoGiveItem(pResult, true); bSubHeader = DS_SUB_HEADER_REFINE_SUCCEED; } else { if (bStrength != 0) { pResult = ITEM_MANAGER::instance().CreateItem(MakeDragonSoulVnum(bType, bGrade, bStep, bStrength - 1)); if (NULL == pResult) { sys_err("INVALID DRAGON SOUL(%d)", MakeDragonSoulVnum(bType, bGrade, bStep, bStrength - 1)); return false; } pDragonSoul->CopyAttributeTo(pResult); RefreshItemAttributes(pResult); } bSubHeader = DS_SUB_HEADER_REFINE_FAIL; char buf[128]; sprintf(buf, "STRENGTH : %d -> %d", bStrength, bStrength - 1); // strength°È´Â ½ÇÆнà ±úÁú ¼öµµ ÀÖ¾î, ¿øº» ¾ÆÀÌÅÛÀ» ¹ÙÅÁÀ¸·Î ·Î±×¸¦ ³²±è. LogManager::instance().ItemLog(ch, pDragonSoul, "DS_STRENGTH_REFINE_FAIL", buf); ITEM_MANAGER::instance().RemoveItem(pDragonSoul, "REMOVE (DS_STRENGTH_REFINE_FAIL)"); pRefineStone->SetCount(pRefineStone->GetCount() - 1); ch->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("°È¿¡ ½ÇÆÐÇß½À´Ï´Ù.")); if (NULL != pResult) ch->AutoGiveItem(pResult, true); } Edited August 14 by Owsap 1 1 https://owsap.dev/ / https://osf.owsap.dev/ Link to comment Share on other sites More sharing options...
Developer Mitachi 1795 Posted August 14 Developer Share Posted August 14 (edited) 1 hour ago, Owsap said: First of all, good job and thanks for sharing the fix with everyone in the community. It's worth mentioning that this bug only occurs when you switch to another core. When I realized this before I actually ended up changing the item update function which saves all items with delayed save, it eventually fixed the issue. Sanity checks for ITEM_MANAGER::Update Hide contents void ITEM_MANAGER::Update() { std::unordered_set<LPITEM>::iterator it = m_set_pkItemForDelayedSave.begin(); std::unordered_set<LPITEM>::iterator this_it; while (it != m_set_pkItemForDelayedSave.end()) { this_it = it++; LPITEM item = *this_it; // NOTE : Check NULL item if (NULL == item) { sys_err("NULL item, erasing from delayed save."); m_set_pkItemForDelayedSave.erase(item); continue; } // NOTE : Check if this item still exists if (FindByVID(item->GetVID()) == NULL) { sys_err("Invalid item, erasing from delayed save."); m_set_pkItemForDelayedSave.erase(item); continue; } // SLOW_QUERY Ç÷¡±×°¡ ÀÖ´Â °ÍÀº Á¾·á½Ã¿¡¸¸ ÀúÀåÇÑ´Ù. if (item->GetOwner() && IS_SET(item->GetFlag(), ITEM_FLAG_SLOW_QUERY)) continue; SaveSingleItem(item); m_set_pkItemForDelayedSave.erase(this_it); } } This is also how I changed my DSManager::DoRefineStrength Hide contents /// @ DSManager::DoRefineStrength if (fnumber(0.f, 100.f) <= fProb) { pResult = ITEM_MANAGER::instance().CreateItem(MakeDragonSoulVnum(bType, bGrade, bStep, bStrength + 1)); if (NULL == pResult) { sys_err("INVALID DRAGON SOUL(%d)", MakeDragonSoulVnum(bType, bGrade, bStep, bStrength + 1)); return false; } char buf[128]; sprintf(buf, "STRENGTH : %d -> %d", bStrength, bStrength + 1); LogManager::instance().ItemLog(ch, pDragonSoul, "DS_STRENGTH_REFINE_SUCCESS", buf); pDragonSoul->CopyAttributeTo(pResult); RefreshItemAttributes(pResult); ITEM_MANAGER::instance().RemoveItem(pDragonSoul, "REMOVE (DS_STRENGTH_REFINE_SUCCESS)"); pRefineStone->SetCount(pRefineStone->GetCount() - 1); ch->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("°È¿¡ ¼º°øÇß½À´Ï´Ù.")); ch->AutoGiveItem(pResult, true); bSubHeader = DS_SUB_HEADER_REFINE_SUCCEED; } else { if (bStrength != 0) { pResult = ITEM_MANAGER::instance().CreateItem(MakeDragonSoulVnum(bType, bGrade, bStep, bStrength - 1)); if (NULL == pResult) { sys_err("INVALID DRAGON SOUL(%d)", MakeDragonSoulVnum(bType, bGrade, bStep, bStrength - 1)); return false; } pDragonSoul->CopyAttributeTo(pResult); RefreshItemAttributes(pResult); } bSubHeader = DS_SUB_HEADER_REFINE_FAIL; char buf[128]; sprintf(buf, "STRENGTH : %d -> %d", bStrength, bStrength - 1); // strength°È´Â ½ÇÆнà ±úÁú ¼öµµ ÀÖ¾î, ¿øº» ¾ÆÀÌÅÛÀ» ¹ÙÅÁÀ¸·Î ·Î±×¸¦ ³²±è. LogManager::instance().ItemLog(ch, pDragonSoul, "DS_STRENGTH_REFINE_FAIL", buf); ITEM_MANAGER::instance().RemoveItem(pDragonSoul, "REMOVE (DS_STRENGTH_REFINE_FAIL)"); pRefineStone->SetCount(pRefineStone->GetCount() - 1); ch->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("°È¿¡ ½ÇÆÐÇß½À´Ï´Ù.")); if (NULL != pResult) ch->AutoGiveItem(pResult, true); } Checking if the item exists by vid to delete it is something you should not do in ITEM_MANAGER::Update rather, you should make sure to also delete it from the map of delayed item directly upon deletion of the item, at least it would make more sense Edited August 14 by Mitachi 1 Link to comment Share on other sites More sharing options...
Premium Abel(Tiger) 1347 Posted August 15 Author Premium Share Posted August 15 13 hours ago, Intel said: Aren't dragonsoul items usuallly not stackable? Dafuq is that SetCount for? You can use SetCount(0) for every item destroy because it does all the remove from character, sync quickslots, destroy item. 10 hours ago, Owsap said: Sanity checks for ITEM_MANAGER::Update You are adding too much complexity to that function and it's not a function you should play with. 12 hours ago, martysama0134 said: The best fix is inside CItem::SetCount by deleting non-owned items (with count 0 obviously) immediately I will not recomend to do this imediatly because some weird problems might appear. But I recomend using something like this and solving all the issues from the source of the problem. if (count == 0 && m_pOwner) { ... } else if(count == 0 && !m_pOwner) { sys_err("Set count 0 with null owner. Id %u - Vnum %d", GetID(), GetVnum()); } Link to comment Share on other sites More sharing options...
Honorable Member martysama0134 7720 Posted August 15 Honorable Member Share Posted August 15 (edited) 3 hours ago, Abel(Tiger) said: I will not recommend to do this immediately because some weird problems might appear. You will get the dangling item pointer straight away (instead of dangling after calling M2_DESTROY_ITEM), but it won't cause a crash even if you use the item ptr immediately after SetCount(0), so you will not notice it (until the next tic). I mean, it can happen, but since the memory section doesn't get overwritten, it is very unlikely that it will happen: item->SetCount(0); // if non-owner count 0 items are internally deleted sys_err("You're not even getting a crash from %d %s even if deleted", item->GetID(), item->GetName()); Even ::UseItemEx suffers of this bug, but nobody noticed. LOG_MANAGER logs using dangling pointers for already 20 years. Only people using the public battlepass system actually get a crash there once in a while. ----------------------------------------------------------- Alternatively, you can call SaveSingleItem: (untested) else if(count == 0 && !m_pOwner) ITEM_MANAGER::instance().SaveSingleItem(this); It will delete the item immediately on the DB, so the other cores won't work on old cached items anymore. I expect that, since we're using TCP, and they can't login to another core until the previous one sends the logout packet, they can't easily bypass it via "lag" (also via client-side channel switch) as before. (tl;dr they must receive the item_destroy packet before the logout one) ----------------------------------------------------------- Otherwise, if you want to totally get rid of the dangling pointers that you could ever get, you need a quite long fix by using a smart pointer: Library: This is the hidden content, please Sign In or Sign Up (the v3 is called cake::owner_ptr) Already tested on Live and 64bit. No lags, no instability. Used on CHARACTER, CItem, CObject, SECTREE, SECTREE_MAP, CDungeon, and so on. Example: LPITEM weapon = GetWear(WEAR_WEAPON); weapon->RemoveFromCharacter(); M2_DESTROY_ITEM(weapon); // if you use proxy_ptr, now weapon is nullptr instead of being a dangling item ptr sys_err("weapon ptr %p", weapon.get()); // 0 Edited August 15 by martysama0134 Core X - External 2 Internal 25 1 2 1 5 Check out my GitHub Link to comment Share on other sites More sharing options...
Honorable Member Owsap 9385 Posted August 15 Honorable Member Share Posted August 15 5 hours ago, Abel(Tiger) said: You are adding too much complexity to that function and it's not a function you should play with. "adding too much complexity" isn’t really accurate. The NULL check and FindByVID check are both simple and lightweight. `FindByVID` function only performs a map lookup (which is efficient). They don’t add complexity or performance issues. Instead, they provide essential safeguards that prevent crashes and keep invalid data out of the database. Without them, you risk saving garbage items or dealing with crashes, which is far worse than the minimal overhead they introduce. The SaveSingleItem function either deletes an item with no owner or saves it to the database if it has an owner. If the VID of an item doesn’t exist, there’s nothing to save or delete. This is why invalid items are erased from m_set_pkItemForDelayedSave. 1 https://owsap.dev/ / https://osf.owsap.dev/ Link to comment Share on other sites More sharing options...
Premium Intel 804 Posted August 15 Premium Share Posted August 15 (edited) I remember screwing up the item pointers with stacking during sort inventory. It took a complete overhaul of the stacking functions for that. We also left this check afterwards anyway (destroyitem, addtocharacter and savesingleitem with a secondary ITEM_VID_MAP) https://metin2.download/picture/2wV7252Vkk6qOeYYEhE8taB9Pp2hB6Ke/.png nice images don't work: // CHECK THE ITEM EXISTED AND HAS NOT BEEN DESTROYED BEFORE if (!ItemVIDExists(item->GetVID())) { sys_err("Trying to SaveSingleItem not existing item: %p %u", item, item->GetVID()); return; } Edited August 15 by Intel Link to comment Share on other sites More sharing options...
Premium masodikbela 1464 Posted August 15 Premium Share Posted August 15 (edited) 11 hours ago, Owsap said: "adding too much complexity" isn’t really accurate. The NULL check and FindByVID check are both simple and lightweight. `FindByVID` function only performs a map lookup (which is efficient). They don’t add complexity or performance issues. Instead, they provide essential safeguards that prevent crashes and keep invalid data out of the database. Without them, you risk saving garbage items or dealing with crashes, which is far worse than the minimal overhead they introduce. The SaveSingleItem function either deletes an item with no owner or saves it to the database if it has an owner. If the VID of an item doesn’t exist, there’s nothing to save or delete. This is why invalid items are erased from m_set_pkItemForDelayedSave. It is a good solution as a first step, using it to discover what functions misuse the item ownership/saving methods this way (like printing some useful stuff like what type of item is this might give some clue where to look or adding an extra help string to the class where you store function/file/line info when you unset the ownership or something like that) or checking for items in the global item map on shutdown after players have been removed might also be a good idea to discover if there are any other kind of memory leaks or possible dupe bugs because of that. But when something is violating the written or unwritten concept (like here you precisely want to forbid items with no owner in the delayed saving queue) the correct final solution is not duct-taping together the wrong parts but finding the whys, the actual reason why is something not working as it should work by that concept. (Like in this case even if its complexity wise not heavy, it would be normally still an unnecessary check IF the rules of this concept is followed everywhere as it should be.) So when at some point some function is not used correctly to me that behaviour is equivalent to a code that doesn't comply. Both is braking a concept, one the compiler's (or language's) and the other is a more virtual, logical or design concept. Its like you can use a screwdriver's handle to bash a nail into its place but that tool is not for that purpose. Thats why I don't like random timers for random actions like you can't warp bla-bla, the cache was designed in a way that whenever a player disconnects from the core if everything is working correctly no dupe and weird stuff should happen as that player cannot join to any other core until every kind of data is flushed to the db, and then only after this can other cores start loading the most up to date data of the player. All I want to say that it is really not a bad idea as a start, but you have to go further than this if you want to correct the underlying mistake. (Not just here, but in situations like this, and I think this mindset is what helped me the most in my programming journey, just wanted to tell this as someone might find this useful.) Edited August 15 by masodikbela 1 1 6 The one and only UI programming guideline Link to comment Share on other sites More sharing options...
Recommended Posts