Showing posts with label .NET. Show all posts
Showing posts with label .NET. Show all posts

Tuesday, January 31, 2023

Fix Paths Too Long Automatically


Path Too Long Auto Fixer v5.1 - is the 1st tool that discovers, reports and auto corrects filenames and paths that are too long to fit under the Windows MAXPATH 260 character limit


Path Too Long Auto Fixer is the 1st tool that discovers, reports and auto corrects filenames and paths that are too long to fit under the MAXPATH 260 character limit in bulk. 
Be sure all your back-up files are copied. Other tools make you go through hundreds of files, one-by-one, renaming and copying them somewhere. Who has time for that now-avoidable activity? Save a huge amount of an IT administrators time to correct this.




The most common way to get the path too long error is to have a too many sub-directories. For example the path below has 20 sub-directories! This is known a as deep directory and is very common in many work place shared drives, downloads and files shared on other operating systems (Max\Linux). The other common way is a filename is just too long.

Tuesday, January 24, 2023

C# NET How to remove ANSI Control Characters fast and reduce to ANSI set




Here's how to reduce a string to ANSI and remove control characters from a string fast in C-Sharp. But be careful since, remove é is not replaces with e. Todo that you need normalize the string, see UnicodetoAscii function. 

ASCII (American Standard Code for Information Interchange) is a 7-bit character set that contains characters from 0 to 127.

The generic term ANSI (American National Standards Institute) is used for 8-bit character sets. These character sets contain the unchanged ASCII character set. In addition, they contain further characters from 128 to 255.


Here's a list of control characters. https://unicode-table.com/en/blocks/general-punctuation/


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
using System; using System.Text; using System.Linq; using System.Diagnostics; 
					
public static class Program
{
	// Based on http://www.codeproject.com/Articles/13503/Stripping-Accents-from-Latin-Characters-A-Foray-in
	// Proper Normalization
	public static string UnicodeToANSI(this string inString)
	{
		var newStringBuilder = new StringBuilder();
		newStringBuilder.Append(inString.Normalize(NormalizationForm.FormKD)
								.Where(x => (x > 30 && x <= 255))
								.ToArray());
		return newStringBuilder.ToString();
	}
	
	//ANSI characters 32 to 127 correspond to those in the 7-bit ASCII character set,
	public static string ReducetoASCII(this string s)
    {
        StringBuilder sb = new StringBuilder(s.Length);
        foreach (char c in s)
        {
            if ((int)c > 255) // remove chars > 127
                continue;
            if ((int)c < 32)  // remove  control characters 
                continue;
            sb.Append(c);
        }
        return sb.ToString();
    }
	
	public static void Main()
	{
		Stopwatch sw = new Stopwatch(); 
		string french = "A Paris, le cortège parisien s’était élancé à 14 heures.\r\n\tFace à l’affluence, un «itinéraire bis» a été mis en place. D’importants rassemblements ont lieu à Bordeaux, Marseille, Rennes ou Lyon. Suivez la journée avec nos journalistes dans toute la France.";
		string ftemp = string.Empty; 
		string german = "ޘ Trump\t\r\nverwechselt Klägerin Carroll auf Foto mit Ex-Frau – das könnte Folgen haben"; 
		string gtemp = string.Empty; 
		Console.WriteLine(french); 
		
		sw.Start();
		ftemp = french.ReducetoASCII(); 
		sw.Stop(); 
		
		Console.WriteLine("Ansi reduced\r\n" + ftemp + " in " + sw.ElapsedTicks); 
		
		sw.Reset(); 
		sw.Start();
		ftemp = french.UnicodeToANSI(); 
		sw.Stop(); 
		
		Console.WriteLine("Proper Normalization\r\n" + ftemp + " in " + sw.ElapsedTicks); 
				
		Console.WriteLine();
		Console.WriteLine();
		Console.WriteLine(german); 
		
		sw.Reset();
		sw.Start();
		gtemp = german.ReducetoASCII(); 
		sw.Stop(); 
		
		Console.WriteLine("Ansi reduced\r\n" + gtemp + " in " + sw.ElapsedTicks); 
		
		sw.Reset(); 
		sw.Start();
		gtemp = german.UnicodeToANSI(); 
		sw.Stop(); 
		
		Console.WriteLine("Proper Normalization\r\n" + gtemp + " in " + sw.ElapsedTicks); 
		
	}
}

Monday, November 16, 2020

C# .NET How to remove blank lines from a string faster, dealing with null '\0' character confusion

Typically, in C# code you would use a 

str.Split(TrimNewLineChars, StringSplitOptions.RemoveEmptyEntries);

to remove empty white-space lines and is effective. But a faster way is to use StringReader and process each line. 

But there are gotcha's introduced when processing null character ('\0') which you might deduce from the above to be a white-space character. But in fact, in C# '\u0000' is a null character, but has no special meaning in C#. It just a null character in a string. It is considered
not white-space but a control character. It's not considered a null either or string terminator as it is in C. To view control characters in VS Code, see my blog post and in Notepad++ see post.

In fact, you can look a the .NET internal storage of a string using, see my next post on this.

using (var writer = new StringWriter())
{
    using (var provider = CodeDomProvider.CreateProvider("CSharp"))
    {
        provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, null);

        return writer.ToString();
    }
}

TLDR / Lesson Learning :

Counter-intuitively 
StringSplitOptions.RemoveEmptyEntries considers null character ('\0') whitespace, not a control character.




Source Code

using System;
using System.Text; 
using System.IO;
using System.Diagnostics;
using System.Text.RegularExpressions;
					
    public static class Program
    {

        public static string RemoveAllWhitespace(this string str)
        {
            var len = str.Length;
            var src = str.ToCharArray();
            var dstIdx = 0;
            for (var i = 0; i < len; i++)
            {
                char ch = src[i];
                if (!char.IsWhiteSpace(ch) && ch != '\0')
                    //ch!='\0')
                    src[dstIdx++] = ch;
                     
            }
            return new string(src, 0, dstIdx);
        }


        public static string TrimStartUnicode(this string str)
        {
            var len = str.Length;
            var src = str.ToCharArray();
            var dstIdx = 0;
            for (var i = 0; i < len; i++)
            {
                char ch = src[i];
                if (!char.IsWhiteSpace(ch) && !char.IsControl(ch) ) 
                {
                    src[dstIdx++] = ch;
                    break;
                }
            }
            return new string(src, 0, dstIdx);
        }

        private static readonly char[] TrimNewLineChars = Environment.NewLine.ToCharArray();
        public static string RemoveEmptyLines(this string str)
        {
            if (str == null)
            {
                return null;
            }
            var lines = str.Split(TrimNewLineChars, StringSplitOptions.RemoveEmptyEntries);

            var sb = new StringBuilder(str.Length);
            foreach (var line in lines)
            {
                if (!String.IsNullOrWhiteSpace(line))
                    sb.AppendLine(line);
            }

            return sb.ToString();
        }

        //Tue 12-May-20 2:08am  - 
        public static String RemoveAllBlankLinesIssue(this string value)
        {

            StringBuilder output = new StringBuilder(value.Length);
            using (StringReader sr = new StringReader(value))
            {
                string line;
                while ((line = sr.ReadLine()) != null)
                {
					//\0 has no special meaning in c# it's just a null character contained in a string.
                    if (line == '\0'.ToString()) //is line length of 1
                    {

                        Console.WriteLine("is char '\\0' empty or null = " + string.IsNullOrEmpty('\0'.ToString()));
                        Console.WriteLine("is char '\\0' whitespace or null = " + String.IsNullOrWhiteSpace('\0'.ToString()));
						Console.WriteLine("is char '\\0' char.IsWhiteSpace = " + char.IsWhiteSpace('\0')); 
						Console.WriteLine("is char '\\0' char.IsControl = " + char.IsControl('\0')); 
						
                    }
                    if (line.Contains('\u0080'.ToString()))
                    {
                        Console.WriteLine("is char '\\0080' empty or null = " + string.IsNullOrEmpty("\u0080").ToString());
                        Console.WriteLine("is char '\\0080' whitespace or null = " + String.IsNullOrWhiteSpace("\u0080").ToString());

                    }

                    if (!String.IsNullOrWhiteSpace(line) && !string.IsNullOrEmpty(line))
                        output.AppendLine(line);
                }

            }
            return output.ToString();
        }

        //Tue 12-May-20 2:08am  - 
        public static String RemoveAllBlankLinesFinal(this string value)
        {
                         
            StringBuilder output = new StringBuilder(value.Length);
            using (StringReader sr = new StringReader(value))
            {
                string line;
                string temp; 
                while ((line = sr.ReadLine()) != null)
                {
                    temp = line.TrimStartUnicode();  
                    
                    if (!string.IsNullOrWhiteSpace(temp) && !string.IsNullOrEmpty(temp))
                        output.AppendLine(line);
                }

            }
            return output.ToString();
        }



        public static String RemoveAllBlankLinesRegex(this string s)
        {
            return Regex.Replace(s, @"^\s+$[\r\n]*", string.Empty, RegexOptions.Multiline);
            //return Regex.Replace(s, @"^(?:[\t ]*(?:\r?\n|\r))+", string.Empty, RegexOptions.Multiline); 
            //return Regex.Replace(s, @"(?<=(?:\r?\n){2}|\A)(?:\r?\n)+", string.Empty, RegexOptions.Multiline); 
            //return Regex.Replace(s, @"(?<=(?:\r?\n){2}$\w)(?:\r?\n)+", string.Empty, RegexOptions.Multiline); 
            //return Regex.Replace(s, @"^\s*(\r\n|\V)", string.Empty, RegexOptions.Multiline); //does not work

        }

        public static void Main()
        {
            string output = string.Empty;

            Stopwatch sw = new Stopwatch();

            string emptytest = "Tell me and I forget.\n \n     \nTeach me and I remember.     \r\n \r\n\r\nInvolve me and I learn.  \r     \r\r\0\r\r   Pad Unicode \\u0080 next line\n\n\u0080\r\rby Benjamin Franklin.\r\n";

            sw.Start();
            output = emptytest.RemoveEmptyLines();
            sw.Stop();
            Console.WriteLine(output + " in " + sw.ElapsedTicks + " ticks");

            Console.WriteLine();
            Console.WriteLine("-------- StringReader Issue ----------------");
            sw.Reset();
            sw.Start();
            output = emptytest.RemoveAllBlankLinesIssue();
            sw.Stop();
            Console.WriteLine(output + " in " + sw.ElapsedTicks + " ticks");


            Console.WriteLine();
            Console.WriteLine("-------- StringReader Final ----------------");
            sw.Reset();
            sw.Start();
            output = emptytest.RemoveAllBlankLinesFinal();
            sw.Stop();
            Console.WriteLine(output + " in " + sw.ElapsedTicks + " ticks");








            Console.WriteLine();
            Console.WriteLine("-------- Regex ----------------");

            sw.Reset();
            sw.Start();
            output = emptytest.RemoveAllBlankLinesRegex();
            sw.Stop();

            Console.WriteLine(output + "\n in " + sw.ElapsedTicks + " ticks");

            

        }
    }

Monday, October 19, 2020

C# .NET Get integer from hexadecimal string, many hex formats supported version 2
















The code below brackets the first likely hex number that matches, from the list of many hex formats that are specified with leading hex prefixes. Then it proceeds to remove prefixes and apply the TryParse functions. 

 
C# TryParse function with NumberStyles.HexNumber requires many hex prefixes to be removed first in order to works,  such as "0x". 

This will fail 
UInt32.TryParse("0x20", 
             NumberStyles.HexNumber, // AllowHexSpecifier - Strings that are parsed using this style cannot be prefixed with "0x" or "&h". 
             CultureInfo.InvariantCulture,  // I've also tried CurrentCulture
             out number));

//See https://docs.microsoft.com/en-us/dotnet/api/system.globalization.numberstyles?view=netcore-3.1#System_Globalization_NumberStyles_AllowHexSpecifier

Note: It is tempting to optimize first regex from
0x[0-9a-f]{2,}
to
0x([0-9a-f]{2,}) and use a group capture
but because we are capturing multiple expressions the overlap is troublesome. 


This is an update to my last post about this - 

C# .NET How to get integer from hexadecimal string, many hex formats supported


This code removes many hex formats prefixes as listed here - https://en.wikipedia.org/wiki/Hexadecimal


Source Code

using System;
using System.Globalization;
using System.Text.RegularExpressions;

public static class Program
{

    const string strRegHexPrefixCandidates = @"0x[0-9a-f]{2,}|%x[0-9a-f]{2,}|\\u[0-9a-f]{2,}|&#x([0-9a-f]){1,6};|&#([0-9a-f]){1,6};|\\x[0-9a-f]{2,}|\\s[0-9a-f]{2,}|U\+[0-9a-f]{2,}|X'[0-9a-f]{2,}|16#([0-9a-f]){2,}|#x([0-9a-f]){2,}|#16r([0-9a-f]){2,6}|&H([0-9a-f]){2,}|0h([0-9a-f]){2,}|#([0-9a-f]){1,6}|%[0-9a-f]{2,}";
    const string strRegGetHexNumber = @"[0-9a-f]{2,}|[«‹»›„‚“‟‘‛”’""""❛❜❝❞〝〞〟"""""'‘][0-9a-f]{2,}[’'""""«‹»›„‚“‟‘‛”’""""❛❜❝❞〝〞〟"]";
    private static readonly Regex rgxHexPre = new Regex(strRegHexPrefixCandidates, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.CultureInvariant | RegexOptions.Compiled);
    private static readonly Regex rgxGetHexAgressive = new Regex(strRegGetHexNumber, RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.Compiled);
    public static void Main()
    {
        //string unicodeText = "UTF-16 (hex)	0x0023 (0023)";
        string unicodeText = @"In XML and XHTML, characters can be expressed as hexadecimal numeric character references using the notation &#xcode;, for instance &#x2019; represents the character U+2019 (the right single quotation mark). If there is no x the number is decimal (thus &#8217; is the same character).[3]";
        //string unicodeText = "8E2";
        //string unicodeText = "this is the end";

        string firstCandidateHexVal = string.Empty;
        
        //https://en.wikipedia.org/wiki/Hexadecimal --remove possible prefixes
        foreach (Match p in rgxHexPre.Matches(unicodeText))
        {
            if (p.Success) { 
                firstCandidateHexVal = p.Value;
                break;
            }
        }

        string prefixfree = string.Empty;
        if (!string.IsNullOrEmpty(firstCandidateHexVal)) 
        { 
            //same prefixes as in Regex
            string[] prefixHexs = new string[] { "0x", "%x", "\\u", "&#x", "&#", "\\x", "\\s", "U+", "X'", "16#", "#x", "#16r", "&H", "0h", "#", "%" };
            foreach (var pre in prefixHexs)
            {
                if (firstCandidateHexVal.IndexOf(pre) > -1)
                {
                    prefixfree = firstCandidateHexVal.Substring(firstCandidateHexVal.IndexOf(pre) + pre.Length);
                    break;
                }
            }
        }

        string finalHexCandy = string.Empty;

        if (string.IsNullOrEmpty(prefixfree))
            finalHexCandy = unicodeText;
        else  
            finalHexCandy = prefixfree;

        Match m = rgxGetHexAgressive.Match(finalHexCandy);
        bool success = false;
        ulong number = 0;
        string hex_value = string.Empty;
        
        if (m.Success)
        {
            try
            {
                hex_value = m.Value;
                //long number = Convert.ToInt64(hex_value, 16); //base 16 - hex....
                //https://stackoverflow.com/questions/2801509/uint32-tryparse-hex-number-not-working -> remove prefixes
                success = ulong.TryParse(hex_value, NumberStyles.HexNumber, CultureInfo.InvariantCulture, out number);
            }
            catch (FormatException)
            {
                unicodeText = string.Format("{0} is not in the correct format for a hexadecimal number.", m.Value);
            }
            catch (OverflowException)
            {
                unicodeText = string.Format("{0} is outside the range of an Int64.", m.Value);
            }
            catch (ArgumentException)
            {
                unicodeText = string.Format("{0} is invalid in base 16.", m.Value);
            }
            catch (Exception ex)
            {
                unicodeText = string.Format("{0} return error\r\n{2}", m.Value, ex.Message);
            }
        }
        else
            unicodeText = "Could not find a hex number in \"" + unicodeText + "\". Select the hex number only.";



        if (!string.IsNullOrEmpty(firstCandidateHexVal))
            hex_value = firstCandidateHexVal; 

        
        if (success)
            unicodeText = string.Format("{0} integer from found {1:N0} hex number in string: {2}", number, hex_value, unicodeText);
        else
            unicodeText = "Could not find a hex number in string: \"" + unicodeText + "\". Select the hex number only.";
        
        Console.WriteLine(unicodeText);
 
    }
}

Tuesday, October 13, 2020

C# .NET How to get integer from hexadecimal string, many hex formats supported















There are many hex formats in which the hex prefixes must be removed, such as "0x" in order for C# TryParse function with NumberStyles.HexNumber to work. 


UInt32.TryParse("0x20", 
             NumberStyles.HexNumber, // I've tried also AllowHexSpecifier
             CultureInfo.InvariantCulture,  // I've also tried CurrentCulture
             out number));


The code below removes many hex formats prefixes as listed here - https://en.wikipedia.org/wiki/Hexadecimal

As per feedback, I improved the algo to first quarantine likely regex hex matches with prefixes then run another regex to extract the hex within that match.



Source Code
using System;
using System.Globalization;
using System.Text.RegularExpressions;

public static class Program
{
	const string strRegGetHexNumber = @"\\u[0-9a-f]{2,}|0x[0-9a-f]{2,}|%[0-9a-f]{2,}|\u0023([0-9a-f]){1,6}|&\u0023x([0-9a-f]){1,6};|\s[0-9a-f]{2,}|[0-9a-f]{2,}|[«‹»›„‚“‟‘‛”’""""❛❜❝❞〝〞〟"""""'‘][0-9a-f]{2,}[’'""""«‹»›„‚“‟‘‛”’""""❛❜❝❞〝〞〟"]";
	private static readonly Regex rgxGetHex = new Regex(strRegGetHexNumber, RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
	public static void Main()
	{
		string unicodeText = "UTF-16 (hex)	0x0023 (0023)";
		//https://en.wikipedia.org/wiki/Hexadecimal --remove possible prefixes
		string prefixfree = string.Empty;
		string[] prefixHexs = new string[]{"0x", "\\u", "#", "&#", "\\x", "\\s", "U+", "X'", "16#", "#x", "#16r", "&H", "0h"};
		foreach (var pre in prefixHexs)
		{
			if (unicodeText.IndexOf(pre) > -1)
			{
				prefixfree = unicodeText.Substring(unicodeText.IndexOf(pre) + pre.Length);
				break;
			}
		}

		if (string.IsNullOrEmpty(prefixfree))
			prefixfree = unicodeText;
		Match m = rgxGetHex.Match(prefixfree);
		bool success = false;
		ulong number = 0;
		string hex_value = string.Empty;
		if (m.Success)
		{
			try
			{
				hex_value = m.Value;
				//long number = Convert.ToInt64(hex_value, 16); //base 16 - hex....
				//https://stackoverflow.com/questions/2801509/uint32-tryparse-hex-number-not-working -> remove prefixes
				success = ulong.TryParse(hex_value, NumberStyles.HexNumber, CultureInfo.InvariantCulture, out number);
			}
			catch (FormatException)
			{
				unicodeText = string.Format("{0} is not in the correct format for a hexadecimal number.", m.Value);
			}
			catch (OverflowException)
			{
				unicodeText = string.Format("{0} is outside the range of an Int64.", m.Value);
			}
			catch (ArgumentException)
			{
				unicodeText = string.Format("{0} is invalid in base 16.", m.Value);
			}
			catch (Exception ex)
			{
				unicodeText = string.Format("{0} return error\r\n{2}", m.Value, ex.Message);
			}
		}
		else
			unicodeText = "Could not find a hex number in \"" + unicodeText + "\". Select the hex number only.";
		if (success)
			unicodeText = string.Format("{0} from found {1:N0} hex number in string {2}", number, hex_value, unicodeText);
		else
			unicodeText = "Could not find a hex number in \"" + unicodeText + "\". Select the hex number only.";
		Console.WriteLine(unicodeText);
	}
}

Wednesday, September 23, 2020

C# .NET How to get Animation, Alpha, EXIF, ICC flags from header of WebP image file - partially loaded file

 The order to consume the header flags was clearly laid out in the Webp RIFF container specification, but getting the bits was tricky. Since even taking little-endian into account and reversing the bit values in the byte, which then aligning the indices to the spec, it still failed. Turns out it was the bit shifting math is using little-endian mode. Therefore, the flags are read in reverse order. 

Blade Runner.webp animated



Source Code


using System;
using System.Net; 
using System.Collections.Generic;
using System.IO;
using System.Drawing;
using System.Linq;
using System.Diagnostics; 

public class Program
{
    //https://stackoverflow.com/questions/111345/getting-image-dimensions-without-reading-the-entire-file/60667939#60667939
    //DecodeWebP reads only lossless :( 
    
    //My version improves DecodeWebP to read all webp formats, lossy, lossless and extended! 
    //https://metadataconsulting.blogspot.com/2020/09/CSharp-dotNET-How-to-get-Animation-Alpha-EXIF-ICC-flags-from-header-of-WebP-image-file-partially-loaded-file.html
                  
    internal static class ImageHelper
    {
        const string errorMessage = "Could not recognise image format.";

        private static Dictionary<byte[], Func<BinaryReader, Size>> imageFormatDecoders = new Dictionary<byte[], Func<BinaryReader, Size>>()
        {
            { new byte[] { 0x42, 0x4D }, DecodeBitmap },
            { new byte[] { 0x47, 0x49, 0x46, 0x38, 0x37, 0x61 }, DecodeGif },
            { new byte[] { 0x47, 0x49, 0x46, 0x38, 0x39, 0x61 }, DecodeGif },
            { new byte[] { 0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A }, DecodePng },
            { new byte[] { 0xff, 0xd8 }, DecodeJfif },
            { new byte[] { 0x52, 0x49, 0x46, 0x46 }, DecodeWebP },
        };

        /// <summary>        
        /// Gets the dimensions of an image.        
        /// </summary>        
        /// <param name="path">The path of the image to get the dimensions of.</param>        
        /// <returns>The dimensions of the specified image.</returns>        
        /// <exception cref="ArgumentException">The image was of an unrecognised format.</exception>            
        public static Size GetDimensions(BinaryReader binaryReader)
        {
            int maxMagicBytesLength = imageFormatDecoders.Keys.OrderByDescending(x => x.Length).First().Length;
            byte[] magicBytes = new byte[maxMagicBytesLength];
            for (int i = 0; i < maxMagicBytesLength; i += 1)
            {
                magicBytes[i] = binaryReader.ReadByte();
                foreach (var kvPair in imageFormatDecoders)
                {
                    if (StartsWith(magicBytes, kvPair.Key))
                    {
                        Console.WriteLine(kvPair.Value.Method);
                        return kvPair.Value(binaryReader);
                        
                    }
                }
            }

            throw new ArgumentException(errorMessage, "binaryReader");
        }
        
        /// <summary>
        /// Gets the dimensions of an image.
        /// </summary>
        /// <param name="path">The path of the image to get the dimensions of.</param>
        /// <returns>The dimensions of the specified image.</returns>
        /// <exception cref="ArgumentException">The image was of an unrecognized format.</exception>
        public static Size GetDimensions(string path)
        {
            using (BinaryReader binaryReader = new BinaryReader(File.OpenRead(path)))
            {
                try
                {
                    return GetDimensions(binaryReader);
                }
                catch (ArgumentException e)
                {
                    if (e.Message.StartsWith(errorMessage))
                    {
                        throw new ArgumentException(errorMessage, "path", e);
                    }
                    else
                    {
                        throw e;
                    }
                }
            }
        }
        
         /// <summary>
        /// Gets the dimensions of an image.
        /// </summary>
        /// <param name="path">The path of the image to get the dimensions of.</param>
        /// <returns>The dimensions of the specified image.</returns>
        /// <exception cref="ArgumentException">The image was of an unrecognized format.</exception>
        public static Size GetDimensions(MemoryStream ms)
        {
            using (BinaryReader binaryReader = new BinaryReader(ms))
            {
                try
                {
                    return GetDimensions(binaryReader);
                }
                catch (ArgumentException e)
                {
                    if (e.Message.StartsWith(errorMessage))
                    {
                        throw new ArgumentException(errorMessage, "path", e);
                    }
                    else
                    {
                        throw e;
                    }
                }
            }
        }


        private static bool StartsWith(byte[] thisBytes, byte[] thatBytes)
        {
            for (int i = 0; i < thatBytes.Length; i += 1)
            {
                if (thisBytes[i] != thatBytes[i])
                {
                    return false;
                }
            }

            return true;
        }

        private static short ReadLittleEndianInt16(BinaryReader binaryReader)
        {
            byte[] bytes = new byte[sizeof(short)];

            for (int i = 0; i < sizeof(short); i += 1)
            {
                bytes[sizeof(short) - 1 - i] = binaryReader.ReadByte();
            }
            return BitConverter.ToInt16(bytes, 0);
        }

        private static int ReadLittleEndianInt32(BinaryReader binaryReader)
        {
            byte[] bytes = new byte[sizeof(int)];
            for (int i = 0; i < sizeof(int); i += 1)
            {
                bytes[sizeof(int) - 1 - i] = binaryReader.ReadByte();
            }
            return BitConverter.ToInt32(bytes, 0);
        }

        private static Size DecodeBitmap(BinaryReader binaryReader)
        {
            binaryReader.ReadBytes(16);
            int width = binaryReader.ReadInt32();
            int height = binaryReader.ReadInt32();
            return new Size(width, height);
        }

        private static Size DecodeGif(BinaryReader binaryReader)
        {
            int width = binaryReader.ReadInt16();
            int height = binaryReader.ReadInt16();
            return new Size(width, height);
        }

        private static Size DecodePng(BinaryReader binaryReader)
        {
            binaryReader.ReadBytes(8);
            int width = ReadLittleEndianInt32(binaryReader);
            int height = ReadLittleEndianInt32(binaryReader);
            return new Size(width, height);
        }

        private static Size DecodeJfif(BinaryReader binaryReader)
        {
            while (binaryReader.ReadByte() == 0xff)
            {
                byte marker = binaryReader.ReadByte();
                short chunkLength = ReadLittleEndianInt16(binaryReader);
                if (marker == 0xc0 || marker == 0xc2) // c2: progressive
                {
                    binaryReader.ReadByte();
                    int height = ReadLittleEndianInt16(binaryReader);
                    int width = ReadLittleEndianInt16(binaryReader);
                    return new Size(width, height);
                }

                if (chunkLength < 0)
                {
                    ushort uchunkLength = (ushort)chunkLength;
                    binaryReader.ReadBytes(uchunkLength - 2);
                }
                else
                {
                    binaryReader.ReadBytes(chunkLength - 2);
                }
            }

            throw new ArgumentException(errorMessage);
        }
        
        //Other libs did not do it 
        //https://github.com/JosePineiro/WebP-wrapper/blob/master/WebPTest/WebPWrapper.cs
        //https://github.com/JimBobSquarePants/ImageProcessor/blob/6092da59e9aa4975e564002ef3c782a8f6bf3384/src/Plugins/ImageProcessor/ImageProcessor.Plugins.WebP/Imaging/Formats/WebPFormat.cs
      
        //fast
        private static Size DecodeWebP(BinaryReader binaryReader)
        {
            //https://developers.google.com/speed/webp/docs/riff_container
            //var riffseg = binaryReader.ReadBytes(4); //already offset 4 bytes 
            //var sizebytes = binaryReader.ReadBytes(4); // Size

            var size = binaryReader.ReadUInt32(); // Size - start at offset 4 
            
            var webp = binaryReader.ReadBytes(4); // start 8 offset

            var type = binaryReader.ReadBytes(4); // start 12 offset - VP8[ ] determination

            string VP8Type = System.Text.Encoding.UTF8.GetString(type);
            
            Console.WriteLine("VP8Type=\""+VP8Type+"\""); 

            int x = 0; 
            int y = 0;

            if (VP8Type == "VP8X") //Extra format - https://developers.google.com/speed/webp/docs/riff_container#extended_file_format
            { 
                
                binaryReader.ReadBytes(4); //skip 4 bytes
                
                byte Flags = binaryReader.ReadByte(); //read 1 byte for all header bit bFlags

                ////////////////////////////////////////////
                ///////// LITTLE INDIAN MATH! REVERSE ORDER - https://developers.google.com/speed/webp/docs/riff_container#extended_file_format
                ////////////////////////////////////////////
                
                int bitPos = 0; //bit position
                //var bitR1C = (((Flags>> bitPos) & 1) != 0); //Originally attempt at order 
                bool bitR3C = (((Flags>> bitPos) & 1) != 0); 
                
                bitPos = 1; 
                //var bitR2C = (((Flags>> bitPos) & 1) != 0); //Originally attempt at order 
                var bitAniC = (((Flags>> bitPos) & 1) != 0); 
                Console.WriteLine("has Animantion ? {0}", bitAniC); 

                bitPos = 2; 
                //var bitICCC = (((Flags>> bitPos) & 1) != 0); //Originally attempt at order 
                var bitXMPC = (((Flags>> bitPos) & 1) != 0);
                Console.WriteLine("has XMP ? {0}", bitXMPC); 

                bitPos = 3;
                //var bitAlphaC = (Flags>> bitPos) & 1; //Originally attempt at order 
                bool bitExifC = (((Flags>> bitPos) & 1) != 0); 
                Console.WriteLine("has EXIF ? {0}", bitExifC); 
                
                bitPos = 4;
                //var bitExifC = (((Flags>> bitPos) & 1) != 0); //Originally attempt at order 
                var bitAlphaC = (((Flags>> bitPos) & 1) != 0);
                Console.WriteLine("has Alpha ? {0}", bitAlphaC); 
                
                bitPos = 5;
                //var bitXMPC = (((Flags>> bitPos) & 1) != 0);//Originally attempt at order 
                var bitICCC = (((Flags>> bitPos) & 1) != 0); 
                Console.WriteLine("has ICC ? {0}", bitAniC); 
                
                bitPos = 6;
                //var bitAniC = (((Flags>> bitPos) & 1) != 0); //Originally attempt at order 
                var bitR2C = (((Flags>> bitPos) & 1) != 0); 

                bitPos = 7;
                //var bitR3C = (Flags>> bitPos) & 1; //Originally attempt at order 
                var bitR1C = (((Flags>> bitPos) & 1) != 0); 

                binaryReader.ReadBytes(3); 

                byte[] w = binaryReader.ReadBytes(3); //24bits for width

                x = 1 + (w[2] << 16 | w[1] << 8 | w[0]); //little endian

                byte[] h = binaryReader.ReadBytes(3); //24bits for height

                y = 1 + (h[2] << 16 | h[1] << 8 | h[0]); 

                return new Size(x, y);

            }
            else if (VP8Type == "VP8L") //Lossless - https://developers.google.com/speed/webp/docs/webp_lossless_bitstream_specification#2_riff_header 
            {
                
                binaryReader.ReadBytes(4); //size
                byte[] sig = binaryReader.ReadBytes(1); //0x2f->47 1 byte signature
                if (sig[0] != 47) new Size(0, 0); 

                byte[] wh = binaryReader.ReadBytes(4); //width and height in 1 read
                x = 1 + (((wh[1] & 0x3F) << 8) | wh[0]); //{1 + ((($b1 & 0x3F) << 8) | $b0)} - https://blog.tcl.tk/38137  
                y = 1 + (((wh[3] & 0xF) << 10) | (wh[2] << 2) | ((wh[1] & 0xC0) >> 6)); //{1 + ((($b3 & 0xF) << 10) | ($b2 << 2) | (($b1 & 0xC0) >> 6))}]

                return new Size(x, y);

            }
            else if (VP8Type == "VP8 ") //Lossy - https://tools.ietf.org/html/rfc6386#section-9.1
            {

                //Lossy - https://tools.ietf.org/html/rfc6386#section-9.1 hard to decipher
                //pc->Width      = swap2(*(unsigned short*)(c+3))&0x3fff;  0x3fff -> 16383 decimal  swap2 - big or little indian depending on machine 
                //pc->Height     = swap2(*(unsigned short*)(c+5))&0x3fff;
              
                //https://blog.tcl.tk/38137 - much better
                  
                binaryReader.ReadBytes(7); //move to offset 23 or 0x17, 23-12+4=7 - open webp lossy file https://developers.google.com/speed/webp/gallery1

                byte[] frameTag = binaryReader.ReadBytes(3); //$b0 != 0x9d->157  || $b1 != 0x01>1 || $b2 != 0x2a->  42 
                if (frameTag[0] != 157 && frameTag[0] != 1 && frameTag[0] != 42) return new Size(0, 0); //invalid webp file

                //reads 2-bytes which is 16-bits, but we want only 14bits, so and it to 14 bits
                x = binaryReader.ReadUInt16() & 0x3fff;    //$width & 0x3fff -> & 0b00_11111111111111 c#7.0 above only
                y = binaryReader.ReadUInt16() & 0x3fff;    //$height & 0x3fff
            
                return new Size(x, y);
            }
            
            return new Size(0, 0);
            
        }

    }
    
    public static string GetFileNameFromURL(string hrefLink) //hack
    {
        string[] parts = hrefLink.Split('/');
        string fileName = string.Empty;

        if (parts.Length > 0)
            fileName = parts[parts.Length - 1];
        else
            fileName = hrefLink;

        return fileName;
    }

    public static void Main()
    {
        Stopwatch sw = new Stopwatch(); 
        //string webpURL = "https://www.gstatic.com/webp/gallery/1.sm.webp";//Lossy
        //string webpURL = "https://www.gstatic.com/webp/gallery3/2_webp_ll.webp";//Lossless
        string webpURL = "https://www.gstatic.com/webp/gallery3/1_webp_a.webp";//Extended with alpha channel
        //string webpURL = "https://mathiasbynens.be/demo/animated-webp-supported.webp"; //animated
        //string webpURL = "http://blog.mindworkshop.com/image/webp003.webp"; //animal
        //string webpURL = "https://1.bp.blogspot.com/-rfUYu37r2No/X2uNWNMz8jI/AAAAAAAAMps/mJow8dGNE80qvWJupR1KEV8RCXo49FKUQCNcBGAsYHQ/w640-h287/BladeRunner.webp";
        
        //string webpURL = "https://res.cloudinary.com/demo/image/upload/fl_awebp/cell_animation.webp";
        string webpfile = GetFileNameFromURL(webpURL); 
        
        Size webpSize = new Size(); 
        
        WebClient wc = new WebClient();
        using (MemoryStream stream = new MemoryStream(wc.DownloadData(webpURL)))
        {
         sw.Start(); 
         webpSize = ImageHelper.GetDimensions(stream); 
         sw.Stop(); 
            
        }
        Console.WriteLine("File \"{0}\" has dimensions [{1}w X {2}h] in {3} ms.", webpfile, webpSize.Width, webpSize.Height, sw.ElapsedMilliseconds);
        
    }
}

Wednesday, September 9, 2020

C# .NET - How to get a string between two strings, minimal spanning, fast

Here's an implementation to get a string between two strings fast, which meets most  expectations what this function should return. It finds the sub-string found "after" the 1st string "before" the next 2nd string. The sticky question is where does before 2nd string exactly start?


Technically, this minimal spanning meaning the string returned is the minimal length found string between the 1st string end and the beginning of the 2nd string. 

See my post on maximally spanning 
https://metadataconsulting.blogspot.com/2020/09/CSharp-dotNET-How-to-get-a-string-between-two-strings.html




Below we compare the 2nd string starting immediately after leaving no space after 1st, versus, 2nd string starting immediately after 1st but starting +1 character or next character after. We want at least one character to return, not empty string. Right.

Immediate after is how most examples are commonly implemented, but it may not be what you want, see comparison image below. Quick note, r stands for rule, see code.








With index immediately after "Exact in charts", getting string between strings "a" to "a" for input "aaaa" is empty! Technically correct. 

With index immediately after "+1 in charts", getting string between strings "a" to "a" for input "aaaa" is "a". Intuitively you would want this. 

Whilst the "+1" seems good, there are repercussions, in some cases might not be want you expect. Best to see it for you own eyes, in 
comparison chart image below.





















To see full effects, run code below for "Exact" or Edit in .NET Fiddle
to try "+1" version. 



using System;using System.Diagnostics; 

public static class Extensions 
{
	/// <summary>
	/// Get a substring between two anchor strings, minimal span
	/// </summary>
	/// <param name="s">source string</param>
	/// <param name="from">search from end of this string</param>
	/// <param name="to">to beginning of this string, searching backwards, from end to start of s</param>
	/// <returns>a substring between from and to, maximal span</returns>
	public static string GetFirstStringBetweenStringsMinSpanCleanup(this string s, string from, string to)
	{
		if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(from) || string.IsNullOrEmpty(to)) return string.Empty;

		int idxFrom = s.IndexOf(from);
		int idxStart = idxFrom + from.Length; //we filter "not found" -1, never race condtn

		if (idxFrom == -1 || idxStart >= s.Length - 1)
			return string.Empty;

		int idxEnd = s.IndexOf(to, idxStart); //Exact definition, but intuitively next line meets likely expectations -> YOU CHOOSE
        //int idxEnd = s.IndexOf(to, idxStart + 1); //Start next position after, leaving a space for 1 character to be returned

           
		if (idxEnd == -1 || idxEnd <= idxStart)
			return string.Empty;

		return s.Substring(idxStart, idxEnd - idxStart);

	}
	/// <summary>
	/// Get a substring between two anchor strings, MINIMAL SPAN
	/// </summary>
	/// <param name="s">source string</param>
	/// <param name="from">search from end of this string</param>
	/// <param name="to">to beginning of this string, searching backwards, from end to start of s</param>
	/// <returns>a substring between from and to, maximal span</returns>
	public static string GetFirstStringBetweenStringsMinSpan(this string s, string from, string to)
	{
		Console.Write("args from {0} to {1} = ", from, to); //debug

		//edge cases
		if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(from) || string.IsNullOrEmpty(to)) return string.Empty;
		//if (from.Length >= s.Length || to.Length >= s.Length) return string.Empty; //redundant, this Rule take care of below

		int idxFrom = s.IndexOf(from);
		int idxStart = idxFrom + from.Length; //we filter "not found" -1, never race condtn

		if (idxFrom == -1)       
			return string.Empty;
		else if (idxStart >= s.Length - 1) //for testing combine to 1 line, we combining a idx with a length, and w/ lengths we normally subtract 1 
		{
			Console.WriteLine("r1. idxStart={0} >= (s.Length - 1)={1}", idxStart, s.Length - 1);
			return string.Empty; 
		}

		int idxEnd = s.IndexOf(to, idxStart); //Exact definition, but intuitively next line meets likely expectations -> YOU CHOOSE
        //int idxEnd = s.IndexOf(to, idxStart + 1); //Start next position after, leaving a space for 1 character to be returned

		if (idxEnd == -1 )
			return string.Empty;
		else if (idxEnd > s.Length - 1) //Rule 2 never gets used
		{
			Console.WriteLine("r2. idxEnd={0} > (s.Length - 1)={1}", idxEnd, s.Length - 1);
			return string.Empty;
		}
		else if (idxEnd <= idxStart)
		{
			Console.WriteLine("r3. idxEnd={0} <= idxStart={1}", idxEnd, idxStart);
			return string.Empty;
		}

		return s.Substring(idxStart, idxEnd - idxStart);

	}
 }

public class Program
{
	public static void Main()
	{
		  string test = "abcd";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(null, null));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", ""));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, test));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", test));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test,""));
            
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "d"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "e"));
            
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("e", "e"));
            
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "d"));
            
            

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a","abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("b", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("c", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("d", "abc"));

            Console.WriteLine(); 
            test = "abcdabcd";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(null, null));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", ""));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, test));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", test));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, ""));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "d"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "e"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("e", "e"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("b", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("c", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("d", "abc"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "d")); //pass


            Console.WriteLine();
            test = "aaaa";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("aa", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("aaa", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("aaaa", "a"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "aa"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "aaa"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "aaaa"));

            Console.WriteLine();
            test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaabbbb";
            Console.WriteLine(test);
            Console.WriteLine("=================");
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("bbbb", "aaaa"));

            string result = string.Empty; 
            Stopwatch sw = new Stopwatch();
            sw.Start();
            result = test.GetFirstStringBetweenStringsMinSpan("aaaaaaa", "bb");
            sw.Stop();
            Console.WriteLine(result);
            Console.WriteLine(" in " + sw.ElapsedTicks + " ticks."); 
	}
}