Jump to content

Efficiently delete completely overlapping entities using a hash table


XDSoft

Recommended Posts

Posted (edited)

Use a hash table to quickly delete completely overlapping entities

=============

What is a Hash Table?

A hash table (or hash map) is a data structure that provides a mechanism for storing key-value pairs, allowing for fast data retrieval. Each key is processed through a hash function, which computes an index (or hash code) into an array of buckets or slots. The key-value pair is then stored in the corresponding bucket. The fundamental operations of a hash table, such as insertion, deletion, and search, typically have an average time complexity of O(1), making them very efficient.

Why is Using a Hash Table to Delete Identical Entities in ARX Highly Efficient?

1. Fast Lookups:

  • The primary strength of a hash table is its ability to perform quick lookups. When deleting duplicate entities, you can compute a hash for each entity and check if that hash already exists in the table.
  • If the hash already exists, it means the entity is a duplicate and can be marked for deletion.
  • If the hash does not exist, the entity is added to the table.


2. Efficient Handling of Collisions:

  • In a hash table, collisions occur when two different entities produce the same hash code. Advanced hash table implementations efficiently handle collisions using techniques like chaining (where each bucket is a linked list of entries) or open addressing (where a collision triggers a search for the next available slot).
  • Proper handling of collisions ensures that even with duplicate entities, the overall performance remains optimal.


3. Scalability:

  • Hash tables scale well with the number of entities. As more entities are added, a well-implemented hash table can resize itself to maintain performance, ensuring that operations remain O(1) on average.
  • This scalability is crucial when dealing with large datasets or complex drawings in ARX.


ARX and Hash Tables

In the context of ARX (AutoCAD Runtime Extension), using a hash table to manage and delete duplicate entities is beneficial due to the following reasons:

1.Unique Identification:

  • Each entity can be uniquely identified using a hash value that combines various properties (such as geometry, layer, color, etc.).
  • This unique identification ensures that only truly identical entities are considered duplicates.


2.Performance:

  • AutoCAD drawings can contain thousands or even millions of entities. Using a hash table to manage duplicates ensures that the process remains efficient and does not degrade performance.
  • This efficiency is especially important in complex CAD environments where users require quick and responsive operations.


Sample Code for Removing Duplicate Entities Using a Hash Table
Here’s a corrected and complete code sample in ARX for removing duplicate entities using a hash table:

 

/**
 * @brief Removes duplicate entities, keeping only unique ones.
 * 
 * @param objectIdArray Array of entity object IDs
 * @param param Bitmask parameter used for selectively calculating the hash value:
 *  - 1: Use the entity's dxfName
 *  - 2: Use the entity's layerName
 *  - 4: Use the entity's colorIndex
 *  - 8: Use the entity's linetypeName
 *  - 16: Use the entity's linetypeScale
 *  - 32: Use the entity's lineWidth
 *  - 64: Use the entity's GRIP Point array (nStretchPts)
 *  - 128: Use the entity's bounding box (box)
 *  - 256: Use the entity's centroid
 *  - 512: Use the entity's description (entity->desc())
 */
void XdDbUtils::RemoveDuplicateEntities(AcDbObjectIdArray& objectIdArray, int param)
{
    std::unordered_map<size_t, AcDbObjectId> entityHashmap;
    AcDbObjectIdArray remainingObjects;
 
    // Iterate through the entity object array
    for (int i = 0; i < objectIdArray.length(); ++i) {
        AcDbObjectId objectId = objectIdArray[i];
        AcDbEntity* pEntity = nullptr;
        if (acdbOpenAcDbEntity(pEntity, objectId, AcDb::kForRead) == Acad::eOk && pEntity) {
            // Calculate the hash value of the entity
            size_t hash = XdDbUtils::CalculateHash(pEntity, param);
 
            // Check if the hash value already exists in the hash table
            if (entityHashmap.find(hash) == entityHashmap.end()) {
                // The hash value does not exist, add to the hash table and remainingObjects array
                entityHashmap[hash] = objectId;
                remainingObjects.append(objectId);
            }
            // Close the entity
            pEntity->close();
        }
    }
 
    // Replace the original objectIdArray with the remainingObjects array
    objectIdArray = remainingObjects;
}
 

 

Conclusion

Using a hash table to delete duplicate entities in ARX is highly efficient due to the hash table's fast lookup, insertion, and deletion operations. This efficiency is crucial for maintaining performance in complex AutoCAD environments, ensuring that even large datasets are processed quickly and accurately.
 

======================

The following is the LISP implementation code:

 

(xdrx-entity-removeduplicates ss 129))

Mask 128+1 = 129 used,Compute hash value with entity type and bounding box (geometric extent)

 

(defun c:xdtb_removeovents (/ ss len ret)
  (xdrx-begin)
  (if (setq ss (xdrx-ssget
		 (xdrx-string-multilanguage
		   "\n选择要处理的对象<退出>:"
		   "\nSelect objects to process <Exit>:"
		 )
	       )
      )
    (progn
      (setq len (sslength ss))
      (setq ret (xdrx-entity-removeduplicates ss 129))
      (xdrx-prompt
	(xdrx-string-formatex
	  (xdrx-string-multilanguage
	    "\n共删除了 %d 个重复的实体."
	    "\nA total of %d duplicate entities have been removed."
	  )
	  (- len ret)
	)
      )
    )
  )
  (xdrx-end)
  (princ)
)

 

Video_2024-05-23_085739.gif.5284a4eae479f8099c45422f330eb252.gif

 

=====================================

 

The above code uses XDrx API,

download link:

https://github.com/xdcad/XDrx-API-zip

https://sourceforge.net/projects/xdrx-api-zip/

Dual version link:

https://github.com/xdcad

Edited by XDSoft
Link to comment
Share on other sites

11 hours ago, Steven P said:

Does this improve on overkill?

 

Just talk about algorithms


A normal implementation to eliminate duplicate entities requires a two-level loop. Each entity must be traversed to check whether it overlaps with other entities, so the algorithm time complexity is O(n2).

 

Using a hash table, the time complexity of the hash table itself is O(1), but considering that the entities have to be traversed once, the time complexity is O(n). When processing a large number of entities, the efficiency improvement is orders of magnitude.

 

It is not known whether AUTOCAD's OVERKILL uses efficient algorithms such as hash tables.

 

If you are interested, you can test the efficiency comparison between overkill and (xdrx-entity-removeduplicates...) function

Link to comment
Share on other sites

;|
@param param Bitmask parameter used for selectively calculating the hash value:
 *  - 1: Use the entity's dxfName
 *  - 2: Use the entity's layerName
 *  - 4: Use the entity's colorIndex
 *  - 8: Use the entity's linetypeName
 *  - 16: Use the entity's linetypeScale
 *  - 32: Use the entity's lineWidth
 *  - 64: Use the entity's GRIP Point array (nStretchPts)
 *  - 128: Use the entity's bounding box (box)
 *  - 256: Use the entity's centroid
 *  - 512: Use the entity's description (entity->desc())
|;
Command: (xdrx-entity-hashstring (entlast) 1)
"1078249248256508798"
Command: (xdrx-entity-hashstring (entlast) 3)
"11643239585821548497"
Command: (xdrx-entity-hashstring (entlast) 129)
"3826409512706688743"

 

In addition to the xdrx-entity-removeduplicates provided above, the XDRX API
Functions for calculating the hash value of an entity are also provided:
xdrx-entity-hashstring

Link to comment
Share on other sites

More interested if you have tried a side-by-side test of your code and the one supplied in the software to see if yours is an improvement... if no improvement or minimal then I'll use Overkill you see

Link to comment
Share on other sites

Posted (edited)

Efficiency test, 510 entities, including 505 duplicate entities

Hash table efficiency far exceeds overkill

 

===============================

Command: tt
Select objects: Specify opposite corner: 510 found

Select objects:

===============================

Command: (tt1)

CPU:(1x)Intel(R) Xeon(R) E3-1575M v5 @ 3.00GHz 4Cores  / Memory:64G / OS: WIN10 professional workstation version
Benchmarking ......
Current settings: Tolerance=0.000001, Ignore=None, Optimize polylines=Yes, Combine partial overlap=Yes, Combine end-to-end=Yes
505 duplicate(s) deleted
0 overlapping object(s) or segment(s) deleted.

done for 16 iterations. Sorted from fastest.
Statement                    Increment  Time(ms)  Normalize  Relative
-------------------------------------------------------------------------------
(xdrx-entity-removeduplic...)       16      1078         1078     28.07 <fastest>
(COMMAND ._-overkill SS  )          1       1891      30256       1.00 <slowest>
-------------------------------------------------------------------------------

 

 

(defun c:tt ()
  (if (setq ss (ssget))
    (progn
     (setq ss1 (xdrx-entity-copy ss))
    )
  )
  (princ)
)
(defun tt1 ()
  (xd::quickbench
    '((command "._-overkill" ss "" "")
      (xdrx-entity-removeduplicates ss1 129)
     )
  )
  (princ)
)

 

Edited by XDSoft
Link to comment
Share on other sites

On 5/24/2024 at 4:14 PM, Steven P said:

More interested if you have tried a side-by-side test of your code and the one supplied in the software to see if yours is an improvement... if no improvement or minimal then I'll use Overkill you see

 

After testing it, you can see the results above.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...