Efficiently delete completely overlapping entities using a hash table

May 23

Use a hash table to quickly delete completely overlapping entities

=============

What is a Hash Table?

A hash table (or hash map) is a data structure that provides a mechanism for storing key-value pairs, allowing for fast data retrieval. Each key is processed through a hash function, which computes an index (or hash code) into an array of buckets or slots. The key-value pair is then stored in the corresponding bucket. The fundamental operations of a hash table, such as insertion, deletion, and search, typically have an average time complexity of O(1), making them very efficient.

Why is Using a Hash Table to Delete Identical Entities in ARX Highly Efficient?

1. Fast Lookups:

The primary strength of a hash table is its ability to perform quick lookups. When deleting duplicate entities, you can compute a hash for each entity and check if that hash already exists in the table.
If the hash already exists, it means the entity is a duplicate and can be marked for deletion.
If the hash does not exist, the entity is added to the table.

2. Efficient Handling of Collisions:

In a hash table, collisions occur when two different entities produce the same hash code. Advanced hash table implementations efficiently handle collisions using techniques like chaining (where each bucket is a linked list of entries) or open addressing (where a collision triggers a search for the next available slot).
Proper handling of collisions ensures that even with duplicate entities, the overall performance remains optimal.

3. Scalability:

Hash tables scale well with the number of entities. As more entities are added, a well-implemented hash table can resize itself to maintain performance, ensuring that operations remain O(1) on average.
This scalability is crucial when dealing with large datasets or complex drawings in ARX.

ARX and Hash Tables

In the context of ARX (AutoCAD Runtime Extension), using a hash table to manage and delete duplicate entities is beneficial due to the following reasons:

1.Unique Identification:

Each entity can be uniquely identified using a hash value that combines various properties (such as geometry, layer, color, etc.).
This unique identification ensures that only truly identical entities are considered duplicates.

2.Performance:

AutoCAD drawings can contain thousands or even millions of entities. Using a hash table to manage duplicates ensures that the process remains efficient and does not degrade performance.
This efficiency is especially important in complex CAD environments where users require quick and responsive operations.

Sample Code for Removing Duplicate Entities Using a Hash Table
Here’s a corrected and complete code sample in ARX for removing duplicate entities using a hash table:

/**
 * @brief Removes duplicate entities, keeping only unique ones.
 * 
 * @param objectIdArray Array of entity object IDs
 * @param param Bitmask parameter used for selectively calculating the hash value:
 *  - 1: Use the entity's dxfName
 *  - 2: Use the entity's layerName
 *  - 4: Use the entity's colorIndex
 *  - 8: Use the entity's linetypeName
 *  - 16: Use the entity's linetypeScale
 *  - 32: Use the entity's lineWidth
 *  - 64: Use the entity's GRIP Point array (nStretchPts)
 *  - 128: Use the entity's bounding box (box)
 *  - 256: Use the entity's centroid
 *  - 512: Use the entity's description (entity->desc())
 */
void XdDbUtils::RemoveDuplicateEntities(AcDbObjectIdArray& objectIdArray, int param)
{
    std::unordered_map<size_t, AcDbObjectId> entityHashmap;
    AcDbObjectIdArray remainingObjects;
 
    // Iterate through the entity object array
    for (int i = 0; i < objectIdArray.length(); ++i) {
        AcDbObjectId objectId = objectIdArray[i];
        AcDbEntity* pEntity = nullptr;
        if (acdbOpenAcDbEntity(pEntity, objectId, AcDb::kForRead) == Acad::eOk && pEntity) {
            // Calculate the hash value of the entity
            size_t hash = XdDbUtils::CalculateHash(pEntity, param);
 
            // Check if the hash value already exists in the hash table
            if (entityHashmap.find(hash) == entityHashmap.end()) {
                // The hash value does not exist, add to the hash table and remainingObjects array
                entityHashmap[hash] = objectId;
                remainingObjects.append(objectId);
            }
            // Close the entity
            pEntity->close();
        }
    }
 
    // Replace the original objectIdArray with the remainingObjects array
    objectIdArray = remainingObjects;
}

Conclusion

Using a hash table to delete duplicate entities in ARX is highly efficient due to the hash table's fast lookup, insertion, and deletion operations. This efficiency is crucial for maintaining performance in complex AutoCAD environments, ensuring that even large datasets are processed quickly and accurately.

======================

The following is the LISP implementation code：

(xdrx-entity-removeduplicates ss 129))

Mask 128+1 = 129 used，Compute hash value with entity type and bounding box (geometric extent)

(defun c:xdtb_removeovents (/ ss len ret)
  (xdrx-begin)
  (if (setq ss (xdrx-ssget
		 (xdrx-string-multilanguage
		   "\n选择要处理的对象<退出>:"
		   "\nSelect objects to process <Exit>:"
		 )
	       )
      )
    (progn
      (setq len (sslength ss))
      (setq ret (xdrx-entity-removeduplicates ss 129))
      (xdrx-prompt
	(xdrx-string-formatex
	  (xdrx-string-multilanguage
	    "\n共删除了 %d 个重复的实体."
	    "\nA total of %d duplicate entities have been removed."
	  )
	  (- len ret)
	)
      )
    )
  )
  (xdrx-end)
  (princ)
)

=====================================

The above code uses XDrx API,

download link:

https://github.com/xdcad/XDrx-API-zip

https://sourceforge.net/projects/xdrx-api-zip/

Dual version link:

https://github.com/xdcad

Edited May 23 by XDSoft

May 23

Does this improve on overkill?

May 24

11 hours ago, Steven P said:

Does this improve on overkill?

Just talk about algorithms

A normal implementation to eliminate duplicate entities requires a two-level loop. Each entity must be traversed to check whether it overlaps with other entities, so the algorithm time complexity is O(n2).

Using a hash table, the time complexity of the hash table itself is O(1), but considering that the entities have to be traversed once, the time complexity is O(n). When processing a large number of entities, the efficiency improvement is orders of magnitude.

It is not known whether AUTOCAD's OVERKILL uses efficient algorithms such as hash tables.

If you are interested, you can test the efficiency comparison between overkill and (xdrx-entity-removeduplicates...) function

May 24

;|
@param param Bitmask parameter used for selectively calculating the hash value:
 *  - 1: Use the entity's dxfName
 *  - 2: Use the entity's layerName
 *  - 4: Use the entity's colorIndex
 *  - 8: Use the entity's linetypeName
 *  - 16: Use the entity's linetypeScale
 *  - 32: Use the entity's lineWidth
 *  - 64: Use the entity's GRIP Point array (nStretchPts)
 *  - 128: Use the entity's bounding box (box)
 *  - 256: Use the entity's centroid
 *  - 512: Use the entity's description (entity->desc())
|;
Command: (xdrx-entity-hashstring (entlast) 1)
"1078249248256508798"
Command: (xdrx-entity-hashstring (entlast) 3)
"11643239585821548497"
Command: (xdrx-entity-hashstring (entlast) 129)
"3826409512706688743"

In addition to the xdrx-entity-removeduplicates provided above, the XDRX API
Functions for calculating the hash value of an entity are also provided:
xdrx-entity-hashstring

May 24

More interested if you have tried a side-by-side test of your code and the one supplied in the software to see if yours is an improvement... if no improvement or minimal then I'll use Overkill you see

May 25

Efficiency test, 510 entities, including 505 duplicate entities

Hash table efficiency far exceeds overkill

===============================

Command: tt
Select objects: Specify opposite corner: 510 found

Select objects:

===============================

Command: (tt1)

CPU:(1x)Intel(R) Xeon(R) E3-1575M v5 @ 3.00GHz 4Cores / Memory:64G / OS: WIN10 professional workstation version
Benchmarking ......
Current settings: Tolerance=0.000001, Ignore=None, Optimize polylines=Yes, Combine partial overlap=Yes, Combine end-to-end=Yes
505 duplicate(s) deleted
0 overlapping object(s) or segment(s) deleted.

done for 16 iterations. Sorted from fastest.
Statement                    Increment Time(ms) Normalize Relative
-------------------------------------------------------------------------------
(xdrx-entity-removeduplic...)       16      1078 1078     28.07 <fastest>
(COMMAND ._-overkill SS ) 1 1891      30256 1.00 <slowest>
-------------------------------------------------------------------------------

(defun c:tt ()
  (if (setq ss (ssget))
    (progn
     (setq ss1 (xdrx-entity-copy ss))
    )
  )
  (princ)
)
(defun tt1 ()
  (xd::quickbench
    '((command "._-overkill" ss "" "")
      (xdrx-entity-removeduplicates ss1 129)
     )
  )
  (princ)
)

Edited May 25 by XDSoft

May 25

On 5/24/2024 at 4:14 PM, Steven P said:

More interested if you have tried a side-by-side test of your code and the one supplied in the software to see if yours is an improvement... if no improvement or minimal then I'll use Overkill you see

After testing it, you can see the results above.

Sign In

Efficiently delete completely overlapping entities using a hash table

Recommended Posts

XDSoft

Link to comment

Share on other sites

Steven P

Link to comment

Share on other sites

XDSoft

Link to comment

Share on other sites

XDSoft

Link to comment

Share on other sites

Steven P

Link to comment

Share on other sites

XDSoft

Link to comment

Share on other sites

XDSoft

Link to comment

Share on other sites

Join the conversation

Tutorials

Forums

Activity