Pages

Monday, June 10, 2024

decomet - remove comments and multiple blank lines from code, fast cmd line c executable


decomet is a cosmic fast Windows command line tool that will minify source code. By default this will remove all comment lines starting with // or contained  with /**/. 

It features;
  1. remove all blank/empty lines. Blank is defined as whitespace* 
  2. remove all Unicode control characters, except tab, form feed and carriage return.
  3. remove empty duplicate lines reduce them to 1 line for readability of code.
  4. remove indent whitespace*
  5. minify and normalize whitespace* to a single space
  6. prefix with line number. Specifically line number, tab and then line  
  7. recurses sub-directories
  8. funnel to a single output directory
  9. added datetime stamp to output filename *udpate as of Mon 10-Jun-24 2:57pm EDT

*ISO 30112 defines POSIX whitespace characters for function iswspace() for locale 'en_US.UTF8' as Unicode characters U+0009..U+000D, U+0020, U+1680, U+180E, U+2000..U+2006, U+2008..U+200A, U+2028, U+2029, U+205F, and U+3000
  
It super fast and written in C/C++ mixture.
It reads and writes UTF-8 source code. 
Files accept Unicode names. 
Built-in human readable elapsed time.

The code base for this project is http://code.google.com/p/cpp-decomment/ but has been greatly improved to handle Unicode spaces, control characters, UTF-8 files and UTF-8 filename. Moreover, the state machine has been optimized and improved to work.
Improved code to make sure all the switches actually work.

Download decomet.zip. Personal use only. Opens this page after run.
Use metadataconsult@gmail.com for license, $10.00 USD to remove open page.
As with all my software - 100% no malware or spyware. I am trying to sell this and that would be a bad idea.



decomet -h 2> help.txt - to pipe to a file 'help.txt' 

Usage:  decomet -[bcehimnprsv] [-d<DIR>] file1.c file2.js ...
Outputs (adds extension .dec.{org ext})  file1.c.dec.c file2.js.dec.js ...
Decomment source files, optionally remove whitespace, control characters and duplicate 
empty lines 

  -b         remove all whitespace* blank/empty lines
  -c         preprocess & remove control characters in ASCII and UNicode range
             U+0001..U+0008, U+000E..U+001F and U+007F..U+009F, respectively.
             NOTE: U+001A 'SUB' Substitute character will terminate reading a text file 
unexpectedly. 
  -e         Removes duplicate Unicode whitespace* entire lines aka 'empty lines', leaving 
1 line. 
             *ISO 30112 defines POSIX whitespace characters for function iswspace() for 
locale 'en_US.UTF8' as Unicode characters 
             U+0009..U+000D, U+0020, U+1680, U+180E, U+2000..U+2006, U+2008..U+200A, 
U+2028, U+2029, U+205F, and U+3000 
  -h         display help message
  -i         remove indent whitespace*
  -m         minify && normalize whitespace* to a single space
  -n         prefix with line number
  -p         preview files matching wildcard for recursive search
  -r         recursive search sub-dirs under the input-file's folder - file wildcard needed
  -s         output to stdout, instead of output-files (infile1.c.dec.c)
  -t         add datetime to output. Example: infile1.c.dec_10Jun24_1347PM.c
  -v         switch off verbose - default on

  -d<DIR>    output funnel directory, no space after -d

  file[*?].c input-files, file wildcard [?*] allowed. The output-file is 'filexxx.c.dec.c'

Features:

        Fast, written in mainly C, C++ for Unicode support
        Read and writes UTF-8 text files
        Implements a state machine for parsing to remove comments, enforce min. spaces, etc.
        Implements a stack for file/folder traversal

Limitations:

        Each line length is a max of 100,000 characters wide
        Does not handle long file paths (>260)

Notes:

        org src code - http://code.google.com/p/cpp-decomment/
        improved to handle Unicode, UTF-8 files && remove duplicate lines, Unicode 
whitespace 
        fixed stack imp (org. failed if single double quote found with -m switch)
        improved to assure all switches work correctly, etc.

decomet version 2.0.2.4
Copyright © 2024 metadataconsulting.ca, Mark Pahulje 

THE SOFTWARE IS PROVIDED "AS IS", EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE 
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO 
EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Speed test on 100 lines of C++ file.

1. input2.cpp
Input   100 lines.
Output  100 lines.
Removed 0 lines.
Elapsed 3ms.

Speed test on 1 Gig text file.

I:\WORK-CODE\Visual Studio Projects\decomment\Debug>decomet -e 1gb.txt
Input   42949674 lines.
Output  42949670 lines.
Removed 4 lines.
Elapsed 6min 29s 170ms.

No comments:

Post a Comment