Nikon Posted May 15, 2023 Posted May 15, 2023 (edited) Good afternoon After converting pdf to dwg, some texts (letters) are created as polylines. Is it possible to programmatically replace polylines with text? Thanks PL - текст.dwg Edited May 15, 2023 by Nikon Quote
Steven P Posted May 15, 2023 Posted May 15, 2023 Through stand along AutoCAD, no paid for add ons? No. Simple answer. Would like to be wrong of course and someone say "You're wrong, here is a nice LISP" - but so many variables: fonts, text sizes, italics, bold, number of line segments, alphabets, and so on to look at I don't think it is out there. You'd need Optical Character Recognition software. There are line to text converters out there that work on PDFs I think The originator might be able to supply the original CAD file? Quote
Nikon Posted May 15, 2023 Author Posted May 15, 2023 Only .shx texts turn into polylines. Express Tools has a function for splitting text into lines, is there really no inverse function? Quote
lrm Posted May 15, 2023 Posted May 15, 2023 You could use OCR (Optical Character Recognition) software either when the PDF is created (Adobe Acrobat includes this capability ) or use AutoCAD Raster Design which has an OCR option. Quote
Steven P Posted May 15, 2023 Posted May 15, 2023 1 hour ago, Nikon said: Express Tools has a function for splitting text into lines, is there really no inverse function? From PDF - You'd need to identify the lines that are text, by the time they are converted to lines, PDF, imported to CAD, PDF 'exploded' then they are just lines at that stage. Optical character recognition is powerful enough to do this but for a free piece of code, you'll be struggling to get a LISP Quote
SLW210 Posted May 15, 2023 Posted May 15, 2023 (edited) Newer AutoCAD has this function, maybe around 2018 version. Are you on AutoCAD 2015 as shown in your information? If you have newer, there is PDFSHXTEXT. The OCR with Raster Design only does text in an image AFAIK. P.S. For future reference, PDFSHXTEXT started with AutoCAD 2017.1 from what I found. Edited May 17, 2023 by SLW210 Added information 1 Quote
Steven P Posted May 15, 2023 Posted May 15, 2023 But that isn't brilliant. A quick 13 character test found 13 letters - including 2 that were the dots above the letter i, 2 were missed. At an angle of 32 degrees it didn't convert any. The help suggests converting the characters 1 by 1 and then checking, and recombining - not brilliant yet. A LISP I part built a while ago asked the user to select the lines that made up a word or sentence, gave a pop up box for the user to retype the word. The original lines were deleted and that text inserted, angled and sized to the longest lines on the assumption that these were the uprights. A couple of tweaks to set the text to a standard size (so no 2.4876 height - it would be 2.5) and the angle was a mean angle between the uprights. It didn't work well so left it on the 'come back later' pile 1 Quote
SLW210 Posted May 16, 2023 Posted May 16, 2023 I never once said anything about it being brilliant. I do not remember the steps, but you can flatten the PDF and use Acrobat's OCR before bringing into AutoCAD. I used to use Acrobat and Illustrator to create vector from PDF, etc. and previous to that GhostScript, pdftotxt and ImageMagick on PDFs. It's a learning curve, but up until a few years ago I did all of this on a Linux distro and used the terminal a lot. pdftotext(1) (xpdfreader.com ImageMagick – Download With a few tweaks, I usually get pretty good results, there are some settings, as well as you can add more fonts to match up. It is a pain in the *** it only does horizontal text. Though, if you could get a LISP to rotate the view to align them horizontal, it would speed things up. I never tried the OCR in Raster Design, but you could try making the lines in AutoCAD into an image and try different OCR programs, most do reasonably well on black letters on white background. Fortunately for me, I usually have to make them actual text only occasionally these days. Lots more people using TTF fonts as well helps. So here is the settings for set up the PDFSHXTEXT. 1 Quote
SLW210 Posted May 16, 2023 Posted May 16, 2023 Your drawing converted pretty good IMO. PL - text.dwg 1 Quote
Nikon Posted May 17, 2023 Author Posted May 17, 2023 Thanks, SLW210! It's a bit of a long process when there are a lot of drawings and texts ... And it's impossible to match a set of text lines with a .shx font after converting pdf to dwg using LISP in any way? Quote
SLW210 Posted May 17, 2023 Posted May 17, 2023 There is PDFSHXTEXT which should run in a Script. I suppose it could run in a batch process. Only problem I see, the more geometry selected, the less accurate the results and if not horizontal, the drawing needs rotated, running from a script with no user input would be selecting all and hope for the best. Should be able to use a LISP to run the command and maybe automatically select smaller areas in the drawing. There have been some efforts to create a LISP. Need help with finding PDFSHXTEXT variable - Autodesk Community - AutoCAD Way past my LISP level, plus I have a lot of work going on right now. Might be a good opportunity for you to give LISP a shot. Quote
SLW210 Posted May 17, 2023 Posted May 17, 2023 If all of the PDFs are simple like your example, it may work okay using AutoCAD. But, I would concentrate on fixing the text in the PDF before importing to AutoCAD. Maybe check some Adobe Acrobat fora and/or research the pdftoedit, ImageMagick, Ghostscript, etc. Overall, if you have a lot of them to do, you might be happier with the results. On that note, I have seen PDFIMPORT scripts, LISPs, etc. So fix the text in PDF, then batch create the .dwg for them. 1 Quote
Nikon Posted May 17, 2023 Author Posted May 17, 2023 No, the drawings are not simple. The example just shows a part of the text from the polylines. The drawing has a large number of callouts and specifications with .shx fonts. Your advice is clear: fix PDF. Maybe because of such difficulties with fonts, it is worth abandoning .shx? To all users? (I understand that this is impossible...) Quote
Steven P Posted May 17, 2023 Posted May 17, 2023 38 minutes ago, Nikon said: Maybe because of such difficulties with fonts, it is worth abandoning .shx? To all users? I'd agree with that but there might be times when the originator doesn't want a conversion from PDF to be easy. If you have a lot to do can you go back to originator to ask for a DWG? Quote
Nikon Posted May 17, 2023 Author Posted May 17, 2023 There is no way to request a DWG, and yes, often the creator does not want his drawings to be used by others... 1 Quote
SLW210 Posted May 18, 2023 Posted May 18, 2023 When I say simple, I mean the font used. If everything is horizontal and pretty much all simplex, that would be an easy conversion to run a script in batch of drawings. One other thing, you might try this VectPDF download | SourceForge.net I used it prior to AutoCAD having the import PDF function. Not sure if it has had any updates in a while. As for SHX fonts, you can make them comments when plotted to PDF. Acrobat can plot them as searchable with PDFMaker. (I am not sure how that comes back into AutoCAD, though.) True Type Fonts can also be made non-searchable, so not foolproof either. (I am not sure how that comes back into AutoCAD, either.) How to create selectable and searchable text in a PDF from AutoCAD (autodesk.com) Quote For TrueType fonts, do not alter the text from the original font, such as changing width (must be 1.0) or other style options. Make sure that the Z coordinate value of the text object is zero. If SHX fonts are used, set the PDFSHX variable to 1 (for AutoCAD 2017 and later; EPDFSHX for AutoCAD 2016). There is no AutoCAD option or feature to make SHX searchable in a PDF in AutoCAD 2015 and earlier. Unfortunately, OCR has been used a lot more for creating image text to editable text, much more development in that area I would surmise, I had some very good OCR software that came with a scanner way back in the 80s. I have found very little on batch converting, either in AutoCAD, Adobe or others. Acrobat may be able to batch convert, I am not sure on that. I would just suggest pick a method and get to work on them manually. 2 Quote
Nikon Posted May 18, 2023 Author Posted May 18, 2023 All inscriptions are made horizontally in a simplex.shx font. SLW210, thanks for the detailed explanations! Quote
SLW210 Posted May 18, 2023 Posted May 18, 2023 I have done more conversions than I want to think about. The sad thing is most of the PDFs I converted should have been supplied as .dwg, but the powers that be either didn't ensure compliance and/or it was never stipulated. Fortunately, we can usually make a call or send an email these days and get a .dwg. (If the company is still in business.) Try redoing a batch of Raster PDFs and Raster Images. No matter how much is in the drawing, I would wager you will be fine using a Script and Batch file on them in that case. 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.