XDSoft Posted May 23 Share Posted May 23 (edited) Use a hash table to quickly delete completely overlapping entities ============= What is a Hash Table? A hash table (or hash map) is a data structure that provides a mechanism for storing key-value pairs, allowing for fast data retrieval. Each key is processed through a hash function, which computes an index (or hash code) into an array of buckets or slots. The key-value pair is then stored in the corresponding bucket. The fundamental operations of a hash table, such as insertion, deletion, and search, typically have an average time complexity of O(1), making them very efficient. Why is Using a Hash Table to Delete Identical Entities in ARX Highly Efficient? 1. Fast Lookups: The primary strength of a hash table is its ability to perform quick lookups. When deleting duplicate entities, you can compute a hash for each entity and check if that hash already exists in the table. If the hash already exists, it means the entity is a duplicate and can be marked for deletion. If the hash does not exist, the entity is added to the table. 2. Efficient Handling of Collisions: In a hash table, collisions occur when two different entities produce the same hash code. Advanced hash table implementations efficiently handle collisions using techniques like chaining (where each bucket is a linked list of entries) or open addressing (where a collision triggers a search for the next available slot). Proper handling of collisions ensures that even with duplicate entities, the overall performance remains optimal. 3. Scalability: Hash tables scale well with the number of entities. As more entities are added, a well-implemented hash table can resize itself to maintain performance, ensuring that operations remain O(1) on average. This scalability is crucial when dealing with large datasets or complex drawings in ARX. ARX and Hash Tables In the context of ARX (AutoCAD Runtime Extension), using a hash table to manage and delete duplicate entities is beneficial due to the following reasons: 1.Unique Identification: Each entity can be uniquely identified using a hash value that combines various properties (such as geometry, layer, color, etc.). This unique identification ensures that only truly identical entities are considered duplicates. 2.Performance: AutoCAD drawings can contain thousands or even millions of entities. Using a hash table to manage duplicates ensures that the process remains efficient and does not degrade performance. This efficiency is especially important in complex CAD environments where users require quick and responsive operations. Sample Code for Removing Duplicate Entities Using a Hash Table Here’s a corrected and complete code sample in ARX for removing duplicate entities using a hash table: /** * @brief Removes duplicate entities, keeping only unique ones. * * @param objectIdArray Array of entity object IDs * @param param Bitmask parameter used for selectively calculating the hash value: * - 1: Use the entity's dxfName * - 2: Use the entity's layerName * - 4: Use the entity's colorIndex * - 8: Use the entity's linetypeName * - 16: Use the entity's linetypeScale * - 32: Use the entity's lineWidth * - 64: Use the entity's GRIP Point array (nStretchPts) * - 128: Use the entity's bounding box (box) * - 256: Use the entity's centroid * - 512: Use the entity's description (entity->desc()) */ void XdDbUtils::RemoveDuplicateEntities(AcDbObjectIdArray& objectIdArray, int param) { std::unordered_map<size_t, AcDbObjectId> entityHashmap; AcDbObjectIdArray remainingObjects; // Iterate through the entity object array for (int i = 0; i < objectIdArray.length(); ++i) { AcDbObjectId objectId = objectIdArray[i]; AcDbEntity* pEntity = nullptr; if (acdbOpenAcDbEntity(pEntity, objectId, AcDb::kForRead) == Acad::eOk && pEntity) { // Calculate the hash value of the entity size_t hash = XdDbUtils::CalculateHash(pEntity, param); // Check if the hash value already exists in the hash table if (entityHashmap.find(hash) == entityHashmap.end()) { // The hash value does not exist, add to the hash table and remainingObjects array entityHashmap[hash] = objectId; remainingObjects.append(objectId); } // Close the entity pEntity->close(); } } // Replace the original objectIdArray with the remainingObjects array objectIdArray = remainingObjects; } Conclusion Using a hash table to delete duplicate entities in ARX is highly efficient due to the hash table's fast lookup, insertion, and deletion operations. This efficiency is crucial for maintaining performance in complex AutoCAD environments, ensuring that even large datasets are processed quickly and accurately. ====================== The following is the LISP implementation code: (xdrx-entity-removeduplicates ss 129)) Mask 128+1 = 129 used,Compute hash value with entity type and bounding box (geometric extent) (defun c:xdtb_removeovents (/ ss len ret) (xdrx-begin) (if (setq ss (xdrx-ssget (xdrx-string-multilanguage "\n选择要处理的对象<退出>:" "\nSelect objects to process <Exit>:" ) ) ) (progn (setq len (sslength ss)) (setq ret (xdrx-entity-removeduplicates ss 129)) (xdrx-prompt (xdrx-string-formatex (xdrx-string-multilanguage "\n共删除了 %d 个重复的实体." "\nA total of %d duplicate entities have been removed." ) (- len ret) ) ) ) ) (xdrx-end) (princ) ) ===================================== The above code uses XDrx API, download link: https://github.com/xdcad/XDrx-API-zip https://sourceforge.net/projects/xdrx-api-zip/ Dual version link: https://github.com/xdcad Edited May 23 by XDSoft Quote Link to comment Share on other sites More sharing options...
Steven P Posted May 23 Share Posted May 23 Does this improve on overkill? Quote Link to comment Share on other sites More sharing options...
XDSoft Posted May 24 Author Share Posted May 24 11 hours ago, Steven P said: Does this improve on overkill? Just talk about algorithms A normal implementation to eliminate duplicate entities requires a two-level loop. Each entity must be traversed to check whether it overlaps with other entities, so the algorithm time complexity is O(n2). Using a hash table, the time complexity of the hash table itself is O(1), but considering that the entities have to be traversed once, the time complexity is O(n). When processing a large number of entities, the efficiency improvement is orders of magnitude. It is not known whether AUTOCAD's OVERKILL uses efficient algorithms such as hash tables. If you are interested, you can test the efficiency comparison between overkill and (xdrx-entity-removeduplicates...) function Quote Link to comment Share on other sites More sharing options...
XDSoft Posted May 24 Author Share Posted May 24 ;| @param param Bitmask parameter used for selectively calculating the hash value: * - 1: Use the entity's dxfName * - 2: Use the entity's layerName * - 4: Use the entity's colorIndex * - 8: Use the entity's linetypeName * - 16: Use the entity's linetypeScale * - 32: Use the entity's lineWidth * - 64: Use the entity's GRIP Point array (nStretchPts) * - 128: Use the entity's bounding box (box) * - 256: Use the entity's centroid * - 512: Use the entity's description (entity->desc()) |; Command: (xdrx-entity-hashstring (entlast) 1) "1078249248256508798" Command: (xdrx-entity-hashstring (entlast) 3) "11643239585821548497" Command: (xdrx-entity-hashstring (entlast) 129) "3826409512706688743" In addition to the xdrx-entity-removeduplicates provided above, the XDRX API Functions for calculating the hash value of an entity are also provided: xdrx-entity-hashstring Quote Link to comment Share on other sites More sharing options...
Steven P Posted May 24 Share Posted May 24 More interested if you have tried a side-by-side test of your code and the one supplied in the software to see if yours is an improvement... if no improvement or minimal then I'll use Overkill you see Quote Link to comment Share on other sites More sharing options...
XDSoft Posted May 25 Author Share Posted May 25 (edited) Efficiency test, 510 entities, including 505 duplicate entities Hash table efficiency far exceeds overkill =============================== Command: tt Select objects: Specify opposite corner: 510 found Select objects: =============================== Command: (tt1) CPU:(1x)Intel(R) Xeon(R) E3-1575M v5 @ 3.00GHz 4Cores / Memory:64G / OS: WIN10 professional workstation version Benchmarking ...... Current settings: Tolerance=0.000001, Ignore=None, Optimize polylines=Yes, Combine partial overlap=Yes, Combine end-to-end=Yes 505 duplicate(s) deleted 0 overlapping object(s) or segment(s) deleted. done for 16 iterations. Sorted from fastest. Statement Increment Time(ms) Normalize Relative ------------------------------------------------------------------------------- (xdrx-entity-removeduplic...) 16 1078 1078 28.07 <fastest> (COMMAND ._-overkill SS ) 1 1891 30256 1.00 <slowest> ------------------------------------------------------------------------------- (defun c:tt () (if (setq ss (ssget)) (progn (setq ss1 (xdrx-entity-copy ss)) ) ) (princ) ) (defun tt1 () (xd::quickbench '((command "._-overkill" ss "" "") (xdrx-entity-removeduplicates ss1 129) ) ) (princ) ) Edited May 25 by XDSoft Quote Link to comment Share on other sites More sharing options...
XDSoft Posted May 25 Author Share Posted May 25 On 5/24/2024 at 4:14 PM, Steven P said: More interested if you have tried a side-by-side test of your code and the one supplied in the software to see if yours is an improvement... if no improvement or minimal then I'll use Overkill you see After testing it, you can see the results above. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.