Friday, December 12, 2025

.NET regular expressions that break Perl Compatible Regular Expressions (PCRE) standard


Examples of patterns that work in PCRE but break in .NET

The Perl Compatible Regular Expressions (PCRE) library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API.
The current version of PCRE2 is version 10.47 and released Oct 21, 2025.


Good news a clever programmer created a dotnet wrapper for the PCRE2 library used in Perl supporting 10.47https://github.com/ltrzesniewski/pcre-net/, 1st version v0.1.0 requires .NET 4.5 Framework.


Pattern Type PCRE        .NET        Why It Breaks
Possessive quantifiers❌ Not implemented
Variable‑length lookbehind.NET requires fixed length
Subroutine calls / recursionFeature not supported
Python‑style named groups.NET uses different syntax
Ungreedy mode ((?U))No equivalent in .NET
Recursive named groupsNot implemented
Modern Unicode properties.NET uses older Unicode
\R newline escape⚠️Only supported in newer .NET
Lookbehind with alternationVariable length


1. Possessive quantifiers (++, *+, ?+, {m,n}+)

PCRE supports:

^\w++$

Meaning: match a word string with no backtracking allowed.

✅ PCRE: Works ❌ .NET: Throws “quantifier following nothing” or treats ++ as literal + depending on context.

Why it breaks: .NET simply does not implement possessive quantifiers.


2. Variable‑length lookbehind

PCRE allows:

(?<=\w+)\d+

Meaning: match digits preceded by one or more word characters.

✅ PCRE: Works ❌ .NET: “Lookbehind assertion is not fixed length”

Why it breaks: .NET requires lookbehind to have a fixed, compile‑time length.


3. Subroutine calls / recursion

PCRE supports recursive patterns:

^(\w+|(\((?1)\)))+$

Used for matching nested parentheses.

✅ PCRE: Works ❌ .NET: “Unrecognized grouping construct”

Why it breaks: .NET does not support (?1), (?&name), or recursive patterns at all.


4. Python‑style named groups ((?P<name>...))

PCRE supports:

(?P<word>\w+)-(?P=word)

✅ PCRE: Works ❌ .NET: “Unrecognized grouping construct”

Why it breaks: .NET only supports (?<name>...) and (?'name'...).


5. Ungreedy mode modifier ((?U))

PCRE supports:

(?U)^.+$

Meaning: make all quantifiers lazy by default.

✅ PCRE: Works ❌ .NET: “Unrecognized inline modifier”

Why it breaks: .NET has no equivalent to PCRE’s global ungreedy mode.


6. Recursive named groups ((?&name))

PCRE:

(?<paren>\((?:[^()]++|(?&paren))*\))

✅ PCRE: Works ❌ .NET: Fails immediately

Why it breaks: .NET has no syntax for recursive named groups.


7. Newline escape \R in older .NET versions

PCRE:

^\R$

✅ PCRE: Works ❌ .NET (pre‑.NET 7): “Unrecognized escape sequence”

Why it breaks: .NET added \R only recently.


8. Lookbehind with alternation of different lengths

(?<=abc|z)\d+

✅ PCRE: Works ❌ .NET: “Lookbehind assertion is not fixed length”

Why it breaks: Alternation creates variable‑length lookbehind.


9. Unicode property escapes beyond Unicode 4.0.1

PCRE supports modern Unicode categories:

\p{Script=Greek}

✅ PCRE: Works ❌ .NET: “Unknown property”

Why it breaks: .NET’s regex engine is tied to older Unicode property tables.


Thursday, December 11, 2025

How to run CrystalDiskInfo on Windows Start-up - Boot Silently



CrystalDiskInfo is effective, reliable, and widely used for monitoring drives health including SSDs.
However, with new memory-chip based solid state drive (SSD)  failures do not happen in an analog fashion like the old spindle hard-drives. SSDs have a finite number of write cycles before the memory cells degrade, although modern SSDs use techniques like wear leveling to prolong their lifespan.

Download  https://sourceforge.net/projects/crystaldiskinfo/files/9.7.2/CrystalDiskInfo9_7_2.exe/download

⚠️ What CDI does not do

  • It cannot predict exact SSD failure dates. No SMART tool canSSDs fail unpredictably once cells wear out.

  • It may not show proprietary vendor‑specific metrics Some SSD makers expose extra data only through their own tools (e.g., Samsung Magician, WD Dashboard). Tom’s Hardware notes that manufacturers often provide deeper diagnostics.

  • It doesn’t test performance or speed That’s CrystalDiskMark, a separate tool.

Most SSDs expose standardized SMART attributes that let you estimate remaining life. These include:


Key SSD SMART Attributes

  • Total Bytes Written (TBW) How much data has been written to the drive so far.

  • Percentage Used / Wear Leveling Count A controller‑calculated estimate of how much of the drive’s write endurance has been consumed.

  • Media Wearout Indicator (MWI) Enterprise SSDs often report this as a value from 100 (new) to 0 (end of life).

  • Reallocated Sector Count Shows if the drive has started remapping worn‑out blocks.


Below is a table of the attributes you mentioned and whether CrystalDiskInfo reports them.

SSD AttributeReported by CrystalDiskInfo?Notes
Total Bytes Written (TBW)YesUsually shown as “Total Host Writes” or “NAND Writes” depending on drive.
Percentage Used / Wear Leveling CountYesNVMe drives show “Percentage Used”; SATA drives show “Wear Leveling Count”.
Media Wearout Indicator (MWI)OftenCommon on enterprise SSDs; may not appear on consumer drives.
Reallocated Sector CountYesStandard SMART attribute for both HDDs and SSDs.


Running CrystalDiskInfo requires admin rights to read SMART data, so when you schedule it through Task Scheduler, Windows triggers UAC unless the task is configured to bypass it. 


CrystalDiskInfo Startup Setting

To enable CrystalDiskInfo to run on start-up choose Function -> Startup and Keep in System Tray (to keep minimized on boot). 





CrystalDiskInfo actually creates a task, you can edit it.


1. Open Task Scheduler

  • Press Win + R

  • Type: taskschd.msc

  • Press Enter

2. Create a new Task (not a Basic Task)

  • In the right panel, click Create Task… (Do not choose “Create Basic Task” — it hides the options you need.)

3. Configure the General tab

  • Name: CrystalDiskInfo

  • Description: optional

  • Security options:

    • Run whether user is logged on or not

    • Run with highest privileges

  • “Configure for”: Windows 7/10/11 in the name, may end with Server....

This is the part that allows the task to elevate without a UAC prompt.

4. Configure the Triggers tab

  • Click New…

  • Choose when you want it to run (daily, at logon, etc.)

  • Click OK

Each trigger you add will run the task silently.

5. Configure the Actions tab

  • Click New…

  • Action: Start a program

  • Program/script: Browse to your CrystalDiskInfo folder and select:

"C:\Program Files (x86)\CrystalDiskInfo\DiskInfo64.exe"
  • Add arguments: \Startup but \Silent works better

This suppresses the UI completely.

  • Click OK

6. Configure the Conditions tab (optional)

Adjust as needed. Most people leave these alone.

7. Configure the Settings tab

Recommended:

  • Allow task to be run on demand (then you can test it by choosing Run in Actions Panel Right)

  • Run task as soon as possible after a scheduled start is missed

8. Save the task

  • Click OK

  • Windows will prompt for your account password

    • This is required so the task can run elevated without UAC.



Tuesday, December 9, 2025

Enumerating the average size of Office and PDF documents - C++ code and tool


Here's some 'vibe coding' to produce a fast C++ code to calculate average size of Office documents .docx, xls, ppt and pdfs for your organization. It scans a drive letter, here's a command line example.

c:\>calcavgsizedocs.exe c:\

This code features: 
  • Long path support (\\?\)
  • Skip reparse points / junctions to avoid infinite loops
  • Unicode-safe console output
  • Multi-threaded scanning + progress + current folder
  • Total docs scanned with thousands separators
  • Ctrl-C gracefully exits
  • VS2010-compatible, no C4996 warnings
5$ USD for a compiled pro version with help exe, email at 

Pro Version

Usage: CalcAvgSizeDocs <drive_letter>
Example: CalcAvgSizeDocs C:

Options:
  -h, --help    Show this help message
  -v, --version Show version info
  -s, --silent  Silence current progress output


This program is provided "AS-IS", without warranty of
merchantability or fitness for a particular purpose.


C++ Code basic version

// CalcAvgSizeDocs.cpp : Defines the entry point for the console application.
//

#include "stdafx.h" required for VisualStudio 2010 win32 console app, with header files or remove
#include <windows.h>
#include <string>
#include <iostream>
#include <vector>
#include <io.h>
#include <fcntl.h>

// ------------------------------------------------------------
// Global Data Structures
// ------------------------------------------------------------

struct Stats
{
    volatile unsigned __int64 count;
    volatile unsigned __int64 totalBytes;

    Stats() : count(0), totalBytes(0) {}
};

Stats docxStats, xlsxStats, pptxStats, pdfStats;

// Work queue
std::vector<std::wstring> dirQueue;
CRITICAL_SECTION queueLock;

// Progress & control
volatile LONG totalDirsQueued = 0;
volatile LONG totalDirsProcessed = 0;
volatile bool scanningFinished = false;
volatile bool cancelRequested = false;

// Current folder display
std::wstring currentFolder;
CRITICAL_SECTION currentFolderLock;

// Total documents scanned
volatile unsigned __int64 totalDocsScanned = 0;

// ------------------------------------------------------------
// Utility
// ------------------------------------------------------------

bool EndsWith(const std::wstring& s, const std::wstring& ext)
{
    if (s.length() < ext.length()) return false;
    return _wcsicmp(s.c_str() + (s.length() - ext.length()), ext.c_str()) == 0;
}

void Accumulate(Stats &stat, unsigned __int64 size)
{
    InterlockedIncrement((LONG*)&stat.count);
    InterlockedExchangeAdd64((LONGLONG*)&stat.totalBytes, size);
    InterlockedExchangeAdd64((LONGLONG*)&totalDocsScanned, 1);
}

void ProcessFile(const std::wstring& path)
{
    WIN32_FILE_ATTRIBUTE_DATA fad;
    if (!GetFileAttributesExW(path.c_str(), GetFileExInfoStandard, &fad))
        return;

    unsigned __int64 size =
        (((unsigned __int64)fad.nFileSizeHigh) << 32) | fad.nFileSizeLow;

    if (EndsWith(path, L".docx"))
        Accumulate(docxStats, size);
    else if (EndsWith(path, L".xlsx"))
        Accumulate(xlsxStats, size);
    else if (EndsWith(path, L".pptx"))
        Accumulate(pptxStats, size);
    else if (EndsWith(path, L".pdf"))
        Accumulate(pdfStats, size);
}

// Format time in HH:MM:SS.mmm
std::wstring FormatDuration(DWORD ms)
{
    DWORD seconds = ms / 1000;
    DWORD minutes = seconds / 60;
    DWORD hours   = minutes / 60;

    seconds %= 60;
    minutes %= 60;

    wchar_t buf[64];
    swprintf(buf, 64, L"%02u:%02u:%02u.%03u",
             hours, minutes, seconds, ms % 1000);

    return buf;
}

// Format numbers with thousands separators
std::wstring FormatWithThousandsSeparator(unsigned __int64 value)
{
    wchar_t buf[64];
    _ui64tow_s(value, buf, 64, 10);
    std::wstring s(buf);

    int insertPosition = (int)s.length() - 3;
    while (insertPosition > 0)
    {
        s.insert(insertPosition, L",");
        insertPosition -= 3;
    }
    return s;
}

// ------------------------------------------------------------
// CTRL-C Handler
// ------------------------------------------------------------

BOOL WINAPI CtrlHandler(DWORD ctrlType)
{
    if (ctrlType == CTRL_C_EVENT ||
        ctrlType == CTRL_BREAK_EVENT ||
        ctrlType == CTRL_CLOSE_EVENT)
    {
        cancelRequested = true;
        scanningFinished = true;
        std::wcout << L"\n\nCTRL-C detected... stopping scan...\n";
        return TRUE;
    }
    return FALSE;
}

// ------------------------------------------------------------
// Worker Thread
// ------------------------------------------------------------

DWORD WINAPI WorkerProc(LPVOID)
{
    for (;;)
    {
        if (cancelRequested)
            break;

        std::wstring folder;

        EnterCriticalSection(&queueLock);
        if (!dirQueue.empty())
        {
            folder = dirQueue.back();
            dirQueue.pop_back();
        }
        else
        {
            LeaveCriticalSection(&queueLock);
            break;
        }
        LeaveCriticalSection(&queueLock);

        InterlockedIncrement(&totalDirsProcessed);

        EnterCriticalSection(&currentFolderLock);
        currentFolder = folder;
        LeaveCriticalSection(&currentFolderLock);

        if (cancelRequested)
            break;

        // Prepend \\?\ for long path support
        std::wstring longPath = L"\\\\?\\" + folder;
        std::wstring searchPath = longPath + L"\\*";

        WIN32_FIND_DATAW fd;
        HANDLE hFind = FindFirstFileW(searchPath.c_str(), &fd);
        if (hFind == INVALID_HANDLE_VALUE)
            continue;

        do
        {
            if (cancelRequested)
                break;

            const wchar_t* name = fd.cFileName;
            if (wcscmp(name, L".") == 0 || wcscmp(name, L"..") == 0)
                continue;

            std::wstring fullPath = folder + L"\\" + name;

            // Skip junctions / reparse points to avoid loops
            if ((fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) &&
                !(fd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT))
            {
                EnterCriticalSection(&queueLock);
                dirQueue.push_back(fullPath);
                LeaveCriticalSection(&queueLock);
                InterlockedIncrement(&totalDirsQueued);
            }
            else if (!(fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY))
            {
                ProcessFile(fullPath);
            }

        } while (FindNextFileW(hFind, &fd));

        FindClose(hFind);
    }

    return 0;
}

// ------------------------------------------------------------
// Progress Thread
// ------------------------------------------------------------

DWORD WINAPI ProgressProc(LPVOID)
{
    while (!scanningFinished && !cancelRequested)
    {
        LONG done = totalDirsProcessed;
        LONG total = totalDirsQueued;
        double pct = (total > 0) ? (done * 100.0 / total) : 0.0;

        EnterCriticalSection(&currentFolderLock);
        std::wcout << L"\rProcessed: " << done
                   << L"/" << total
                   << L" (" << (int)pct << L"%) "
                   << L"Current: " << currentFolder
                   << std::flush;
        LeaveCriticalSection(&currentFolderLock);

        Sleep(500);
    }
    return 0;
}

// ------------------------------------------------------------
// Print Stats
// ------------------------------------------------------------

void PrintStats(const wchar_t* label, const Stats& s)
{
    if (s.count == 0)
    {
        std::wcout << label << L": No files found\n";
        return;
    }

    double avg = (double)s.totalBytes / (double)s.count;
    double kb = avg / 1024.0;

    std::wcout << label << L": " << kb
               << L" KB average (" << s.count << L" files)\n";
}

// ------------------------------------------------------------
// MAIN
// ------------------------------------------------------------

int wmain(int argc, wchar_t* argv[])
{
    // Enable UTF-8 output
    SetConsoleOutputCP(CP_UTF8);
    _setmode(_fileno(stdout), _O_U8TEXT);

    SetConsoleCtrlHandler(CtrlHandler, TRUE);

    DWORD startTime = GetTickCount();

    // Determine drive
    std::wstring root;
    if (argc >= 2)
    {
        std::wstring arg = argv[1];

        if (arg.length() == 1)
            arg += L":\\";
        else if (arg.length() == 2 && arg[1] == L':')
            arg += L"\\";
        else if (arg[arg.length() - 1] != L'\\')
            arg += L"\\";

        root = arg;
    }
    else
    {
        root = L"C:\\";
    }

    std::wcout << L"Scanning drive: " << root << L"\n";

    InitializeCriticalSection(&queueLock);
    InitializeCriticalSection(&currentFolderLock);

    dirQueue.push_back(root);
    InterlockedIncrement(&totalDirsQueued);

    HANDLE hProgress = CreateThread(NULL, 0, ProgressProc, NULL, 0, NULL);

    SYSTEM_INFO si;
    GetSystemInfo(&si);
    int numThreads = si.dwNumberOfProcessors;

    std::vector<HANDLE> threads;
    for (int i = 0; i < numThreads; ++i)
    {
        HANDLE h = CreateThread(NULL, 0, WorkerProc, NULL, 0, NULL);
        threads.push_back(h);
    }

    WaitForMultipleObjects((DWORD)threads.size(), &threads[0], TRUE, INFINITE);
    scanningFinished = true;
    WaitForSingleObject(hProgress, INFINITE);

    DeleteCriticalSection(&queueLock);
    DeleteCriticalSection(&currentFolderLock);

    DWORD endTime = GetTickCount();
    DWORD elapsed = endTime - startTime;

    std::wcout << L"\n\nResults (partial if cancelled):\n";
    PrintStats(L"DOCX", docxStats);
    PrintStats(L"XLSX", xlsxStats);
    PrintStats(L"PPTX", pptxStats);
    PrintStats(L"PDF ", pdfStats);

    std::wcout << L"\nTotal documents scanned: "
               << FormatWithThousandsSeparator(totalDocsScanned)
               << L"\n";

    std::wcout << L"\nTotal execution time: "
               << FormatDuration(elapsed) << L"\n";

    if (cancelRequested)
        std::wcout << L"\nScan cancelled by user.\n";

    return 0;
}

D:\ Drive Results

C:\Users\Markus\Documents\Visual Studio 2010\Projects\CalcAvgSizeDocs\Release>calcavgsizedocs -s d
Scanning drive: d:\

Drive size: 1.09 TB

.DOCX: 125.60 KB average (230 files)
.XLSX: 321.00 KB average (47 files)
.PPTX: 11.63 MB average (11 files)
.PDF : 2.82 MB average (1625 files)

Total documents scanned: 1,913

Total execution time: 00:01:14.537

C:\ Drive Results (pro code output)

C:\Users\Markus\Documents\Visual Studio 2010\Projects\CalcAvgSizeDocs\Release>calcavgsizedocs -s c
Scanning drive: c:\

Drive size: 931.41 GB

Total Folders Processed: 221515


Complete Results:

.DOCX: 364.55 KB average (1782 files)
.XLSX: 109.95 KB average (565 files)
.PPTX: 2.05 MB average (205 files)
.PDF : 1.54 MB average (8179 files)

Total documents scanned: 10,731

Total execution time: 01:00:03.951


Vibe coded new solution 100x faster
C:\Users\Markus\Documents\Visual Studio 2010\Projects\avgsizedocs\Debug>avgsizedocs d:\
Scanning: d:\

Folders: 70013 | Files: 606999
----------------------------------------
Total Folders: 70,751
Total Files  : 607,842
Elapsed Time : 0h 1m 40s
----------------------------------------

.DOCX: Count=137, Total=47.00 MB, Avg=351.33 KB
.XLSX: Count=92, Total=53.95 MB, Avg=600.50 KB
.PPTX: Count=24, Total=23.31 MB, Avg=994.40 KB
.PDF : Count=1476, Total=2.85 GB, Avg=1.97 MB




Please post your results in comments.