Showing posts with label CSharp. Show all posts
Showing posts with label CSharp. Show all posts

Tuesday, January 24, 2023

C# NET How to remove ANSI Control Characters fast and reduce to ANSI set




Here's how to reduce a string to ANSI and remove control characters from a string fast in C-Sharp. But be careful since, remove é is not replaces with e. Todo that you need normalize the string, see UnicodetoAscii function. 

ASCII (American Standard Code for Information Interchange) is a 7-bit character set that contains characters from 0 to 127.

The generic term ANSI (American National Standards Institute) is used for 8-bit character sets. These character sets contain the unchanged ASCII character set. In addition, they contain further characters from 128 to 255.


Here's a list of control characters. https://unicode-table.com/en/blocks/general-punctuation/


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
using System; using System.Text; using System.Linq; using System.Diagnostics; 
					
public static class Program
{
	// Based on http://www.codeproject.com/Articles/13503/Stripping-Accents-from-Latin-Characters-A-Foray-in
	// Proper Normalization
	public static string UnicodeToANSI(this string inString)
	{
		var newStringBuilder = new StringBuilder();
		newStringBuilder.Append(inString.Normalize(NormalizationForm.FormKD)
								.Where(x => (x > 30 && x <= 255))
								.ToArray());
		return newStringBuilder.ToString();
	}
	
	//ANSI characters 32 to 127 correspond to those in the 7-bit ASCII character set,
	public static string ReducetoASCII(this string s)
    {
        StringBuilder sb = new StringBuilder(s.Length);
        foreach (char c in s)
        {
            if ((int)c > 255) // remove chars > 127
                continue;
            if ((int)c < 32)  // remove  control characters 
                continue;
            sb.Append(c);
        }
        return sb.ToString();
    }
	
	public static void Main()
	{
		Stopwatch sw = new Stopwatch(); 
		string french = "A Paris, le cortège parisien s’était élancé à 14 heures.\r\n\tFace à l’affluence, un «itinéraire bis» a été mis en place. D’importants rassemblements ont lieu à Bordeaux, Marseille, Rennes ou Lyon. Suivez la journée avec nos journalistes dans toute la France.";
		string ftemp = string.Empty; 
		string german = "ޘ Trump\t\r\nverwechselt Klägerin Carroll auf Foto mit Ex-Frau – das könnte Folgen haben"; 
		string gtemp = string.Empty; 
		Console.WriteLine(french); 
		
		sw.Start();
		ftemp = french.ReducetoASCII(); 
		sw.Stop(); 
		
		Console.WriteLine("Ansi reduced\r\n" + ftemp + " in " + sw.ElapsedTicks); 
		
		sw.Reset(); 
		sw.Start();
		ftemp = french.UnicodeToANSI(); 
		sw.Stop(); 
		
		Console.WriteLine("Proper Normalization\r\n" + ftemp + " in " + sw.ElapsedTicks); 
				
		Console.WriteLine();
		Console.WriteLine();
		Console.WriteLine(german); 
		
		sw.Reset();
		sw.Start();
		gtemp = german.ReducetoASCII(); 
		sw.Stop(); 
		
		Console.WriteLine("Ansi reduced\r\n" + gtemp + " in " + sw.ElapsedTicks); 
		
		sw.Reset(); 
		sw.Start();
		gtemp = german.UnicodeToANSI(); 
		sw.Stop(); 
		
		Console.WriteLine("Proper Normalization\r\n" + gtemp + " in " + sw.ElapsedTicks); 
		
	}
}

Monday, November 16, 2020

C# .NET How to remove blank lines from a string faster, dealing with null '\0' character confusion

Typically, in C# code you would use a 

str.Split(TrimNewLineChars, StringSplitOptions.RemoveEmptyEntries);

to remove empty white-space lines and is effective. But a faster way is to use StringReader and process each line. 

But there are gotcha's introduced when processing null character ('\0') which you might deduce from the above to be a white-space character. But in fact, in C# '\u0000' is a null character, but has no special meaning in C#. It just a null character in a string. It is considered
not white-space but a control character. It's not considered a null either or string terminator as it is in C. To view control characters in VS Code, see my blog post and in Notepad++ see post.

In fact, you can look a the .NET internal storage of a string using, see my next post on this.

using (var writer = new StringWriter())
{
    using (var provider = CodeDomProvider.CreateProvider("CSharp"))
    {
        provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, null);

        return writer.ToString();
    }
}

TLDR / Lesson Learning :

Counter-intuitively 
StringSplitOptions.RemoveEmptyEntries considers null character ('\0') whitespace, not a control character.




Source Code

using System;
using System.Text; 
using System.IO;
using System.Diagnostics;
using System.Text.RegularExpressions;
					
    public static class Program
    {

        public static string RemoveAllWhitespace(this string str)
        {
            var len = str.Length;
            var src = str.ToCharArray();
            var dstIdx = 0;
            for (var i = 0; i < len; i++)
            {
                char ch = src[i];
                if (!char.IsWhiteSpace(ch) && ch != '\0')
                    //ch!='\0')
                    src[dstIdx++] = ch;
                     
            }
            return new string(src, 0, dstIdx);
        }


        public static string TrimStartUnicode(this string str)
        {
            var len = str.Length;
            var src = str.ToCharArray();
            var dstIdx = 0;
            for (var i = 0; i < len; i++)
            {
                char ch = src[i];
                if (!char.IsWhiteSpace(ch) && !char.IsControl(ch) ) 
                {
                    src[dstIdx++] = ch;
                    break;
                }
            }
            return new string(src, 0, dstIdx);
        }

        private static readonly char[] TrimNewLineChars = Environment.NewLine.ToCharArray();
        public static string RemoveEmptyLines(this string str)
        {
            if (str == null)
            {
                return null;
            }
            var lines = str.Split(TrimNewLineChars, StringSplitOptions.RemoveEmptyEntries);

            var sb = new StringBuilder(str.Length);
            foreach (var line in lines)
            {
                if (!String.IsNullOrWhiteSpace(line))
                    sb.AppendLine(line);
            }

            return sb.ToString();
        }

        //Tue 12-May-20 2:08am  - 
        public static String RemoveAllBlankLinesIssue(this string value)
        {

            StringBuilder output = new StringBuilder(value.Length);
            using (StringReader sr = new StringReader(value))
            {
                string line;
                while ((line = sr.ReadLine()) != null)
                {
					//\0 has no special meaning in c# it's just a null character contained in a string.
                    if (line == '\0'.ToString()) //is line length of 1
                    {

                        Console.WriteLine("is char '\\0' empty or null = " + string.IsNullOrEmpty('\0'.ToString()));
                        Console.WriteLine("is char '\\0' whitespace or null = " + String.IsNullOrWhiteSpace('\0'.ToString()));
						Console.WriteLine("is char '\\0' char.IsWhiteSpace = " + char.IsWhiteSpace('\0')); 
						Console.WriteLine("is char '\\0' char.IsControl = " + char.IsControl('\0')); 
						
                    }
                    if (line.Contains('\u0080'.ToString()))
                    {
                        Console.WriteLine("is char '\\0080' empty or null = " + string.IsNullOrEmpty("\u0080").ToString());
                        Console.WriteLine("is char '\\0080' whitespace or null = " + String.IsNullOrWhiteSpace("\u0080").ToString());

                    }

                    if (!String.IsNullOrWhiteSpace(line) && !string.IsNullOrEmpty(line))
                        output.AppendLine(line);
                }

            }
            return output.ToString();
        }

        //Tue 12-May-20 2:08am  - 
        public static String RemoveAllBlankLinesFinal(this string value)
        {
                         
            StringBuilder output = new StringBuilder(value.Length);
            using (StringReader sr = new StringReader(value))
            {
                string line;
                string temp; 
                while ((line = sr.ReadLine()) != null)
                {
                    temp = line.TrimStartUnicode();  
                    
                    if (!string.IsNullOrWhiteSpace(temp) && !string.IsNullOrEmpty(temp))
                        output.AppendLine(line);
                }

            }
            return output.ToString();
        }



        public static String RemoveAllBlankLinesRegex(this string s)
        {
            return Regex.Replace(s, @"^\s+$[\r\n]*", string.Empty, RegexOptions.Multiline);
            //return Regex.Replace(s, @"^(?:[\t ]*(?:\r?\n|\r))+", string.Empty, RegexOptions.Multiline); 
            //return Regex.Replace(s, @"(?<=(?:\r?\n){2}|\A)(?:\r?\n)+", string.Empty, RegexOptions.Multiline); 
            //return Regex.Replace(s, @"(?<=(?:\r?\n){2}$\w)(?:\r?\n)+", string.Empty, RegexOptions.Multiline); 
            //return Regex.Replace(s, @"^\s*(\r\n|\V)", string.Empty, RegexOptions.Multiline); //does not work

        }

        public static void Main()
        {
            string output = string.Empty;

            Stopwatch sw = new Stopwatch();

            string emptytest = "Tell me and I forget.\n \n     \nTeach me and I remember.     \r\n \r\n\r\nInvolve me and I learn.  \r     \r\r\0\r\r   Pad Unicode \\u0080 next line\n\n\u0080\r\rby Benjamin Franklin.\r\n";

            sw.Start();
            output = emptytest.RemoveEmptyLines();
            sw.Stop();
            Console.WriteLine(output + " in " + sw.ElapsedTicks + " ticks");

            Console.WriteLine();
            Console.WriteLine("-------- StringReader Issue ----------------");
            sw.Reset();
            sw.Start();
            output = emptytest.RemoveAllBlankLinesIssue();
            sw.Stop();
            Console.WriteLine(output + " in " + sw.ElapsedTicks + " ticks");


            Console.WriteLine();
            Console.WriteLine("-------- StringReader Final ----------------");
            sw.Reset();
            sw.Start();
            output = emptytest.RemoveAllBlankLinesFinal();
            sw.Stop();
            Console.WriteLine(output + " in " + sw.ElapsedTicks + " ticks");








            Console.WriteLine();
            Console.WriteLine("-------- Regex ----------------");

            sw.Reset();
            sw.Start();
            output = emptytest.RemoveAllBlankLinesRegex();
            sw.Stop();

            Console.WriteLine(output + "\n in " + sw.ElapsedTicks + " ticks");

            

        }
    }

Monday, October 19, 2020

C# .NET Get integer from hexadecimal string, many hex formats supported version 2
















The code below brackets the first likely hex number that matches, from the list of many hex formats that are specified with leading hex prefixes. Then it proceeds to remove prefixes and apply the TryParse functions. 

 
C# TryParse function with NumberStyles.HexNumber requires many hex prefixes to be removed first in order to works,  such as "0x". 

This will fail 
UInt32.TryParse("0x20", 
             NumberStyles.HexNumber, // AllowHexSpecifier - Strings that are parsed using this style cannot be prefixed with "0x" or "&h". 
             CultureInfo.InvariantCulture,  // I've also tried CurrentCulture
             out number));

//See https://docs.microsoft.com/en-us/dotnet/api/system.globalization.numberstyles?view=netcore-3.1#System_Globalization_NumberStyles_AllowHexSpecifier

Note: It is tempting to optimize first regex from
0x[0-9a-f]{2,}
to
0x([0-9a-f]{2,}) and use a group capture
but because we are capturing multiple expressions the overlap is troublesome. 


This is an update to my last post about this - 

C# .NET How to get integer from hexadecimal string, many hex formats supported


This code removes many hex formats prefixes as listed here - https://en.wikipedia.org/wiki/Hexadecimal


Source Code

using System;
using System.Globalization;
using System.Text.RegularExpressions;

public static class Program
{

    const string strRegHexPrefixCandidates = @"0x[0-9a-f]{2,}|%x[0-9a-f]{2,}|\\u[0-9a-f]{2,}|&#x([0-9a-f]){1,6};|&#([0-9a-f]){1,6};|\\x[0-9a-f]{2,}|\\s[0-9a-f]{2,}|U\+[0-9a-f]{2,}|X'[0-9a-f]{2,}|16#([0-9a-f]){2,}|#x([0-9a-f]){2,}|#16r([0-9a-f]){2,6}|&H([0-9a-f]){2,}|0h([0-9a-f]){2,}|#([0-9a-f]){1,6}|%[0-9a-f]{2,}";
    const string strRegGetHexNumber = @"[0-9a-f]{2,}|[«‹»›„‚“‟‘‛”’""""❛❜❝❞〝〞〟"""""'‘][0-9a-f]{2,}[’'""""«‹»›„‚“‟‘‛”’""""❛❜❝❞〝〞〟"]";
    private static readonly Regex rgxHexPre = new Regex(strRegHexPrefixCandidates, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.CultureInvariant | RegexOptions.Compiled);
    private static readonly Regex rgxGetHexAgressive = new Regex(strRegGetHexNumber, RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.Compiled);
    public static void Main()
    {
        //string unicodeText = "UTF-16 (hex)	0x0023 (0023)";
        string unicodeText = @"In XML and XHTML, characters can be expressed as hexadecimal numeric character references using the notation &#xcode;, for instance &#x2019; represents the character U+2019 (the right single quotation mark). If there is no x the number is decimal (thus &#8217; is the same character).[3]";
        //string unicodeText = "8E2";
        //string unicodeText = "this is the end";

        string firstCandidateHexVal = string.Empty;
        
        //https://en.wikipedia.org/wiki/Hexadecimal --remove possible prefixes
        foreach (Match p in rgxHexPre.Matches(unicodeText))
        {
            if (p.Success) { 
                firstCandidateHexVal = p.Value;
                break;
            }
        }

        string prefixfree = string.Empty;
        if (!string.IsNullOrEmpty(firstCandidateHexVal)) 
        { 
            //same prefixes as in Regex
            string[] prefixHexs = new string[] { "0x", "%x", "\\u", "&#x", "&#", "\\x", "\\s", "U+", "X'", "16#", "#x", "#16r", "&H", "0h", "#", "%" };
            foreach (var pre in prefixHexs)
            {
                if (firstCandidateHexVal.IndexOf(pre) > -1)
                {
                    prefixfree = firstCandidateHexVal.Substring(firstCandidateHexVal.IndexOf(pre) + pre.Length);
                    break;
                }
            }
        }

        string finalHexCandy = string.Empty;

        if (string.IsNullOrEmpty(prefixfree))
            finalHexCandy = unicodeText;
        else  
            finalHexCandy = prefixfree;

        Match m = rgxGetHexAgressive.Match(finalHexCandy);
        bool success = false;
        ulong number = 0;
        string hex_value = string.Empty;
        
        if (m.Success)
        {
            try
            {
                hex_value = m.Value;
                //long number = Convert.ToInt64(hex_value, 16); //base 16 - hex....
                //https://stackoverflow.com/questions/2801509/uint32-tryparse-hex-number-not-working -> remove prefixes
                success = ulong.TryParse(hex_value, NumberStyles.HexNumber, CultureInfo.InvariantCulture, out number);
            }
            catch (FormatException)
            {
                unicodeText = string.Format("{0} is not in the correct format for a hexadecimal number.", m.Value);
            }
            catch (OverflowException)
            {
                unicodeText = string.Format("{0} is outside the range of an Int64.", m.Value);
            }
            catch (ArgumentException)
            {
                unicodeText = string.Format("{0} is invalid in base 16.", m.Value);
            }
            catch (Exception ex)
            {
                unicodeText = string.Format("{0} return error\r\n{2}", m.Value, ex.Message);
            }
        }
        else
            unicodeText = "Could not find a hex number in \"" + unicodeText + "\". Select the hex number only.";



        if (!string.IsNullOrEmpty(firstCandidateHexVal))
            hex_value = firstCandidateHexVal; 

        
        if (success)
            unicodeText = string.Format("{0} integer from found {1:N0} hex number in string: {2}", number, hex_value, unicodeText);
        else
            unicodeText = "Could not find a hex number in string: \"" + unicodeText + "\". Select the hex number only.";
        
        Console.WriteLine(unicodeText);
 
    }
}

Tuesday, October 13, 2020

C# .NET How to get integer from hexadecimal string, many hex formats supported















There are many hex formats in which the hex prefixes must be removed, such as "0x" in order for C# TryParse function with NumberStyles.HexNumber to work. 


UInt32.TryParse("0x20", 
             NumberStyles.HexNumber, // I've tried also AllowHexSpecifier
             CultureInfo.InvariantCulture,  // I've also tried CurrentCulture
             out number));


The code below removes many hex formats prefixes as listed here - https://en.wikipedia.org/wiki/Hexadecimal

As per feedback, I improved the algo to first quarantine likely regex hex matches with prefixes then run another regex to extract the hex within that match.



Source Code
using System;
using System.Globalization;
using System.Text.RegularExpressions;

public static class Program
{
	const string strRegGetHexNumber = @"\\u[0-9a-f]{2,}|0x[0-9a-f]{2,}|%[0-9a-f]{2,}|\u0023([0-9a-f]){1,6}|&\u0023x([0-9a-f]){1,6};|\s[0-9a-f]{2,}|[0-9a-f]{2,}|[«‹»›„‚“‟‘‛”’""""❛❜❝❞〝〞〟"""""'‘][0-9a-f]{2,}[’'""""«‹»›„‚“‟‘‛”’""""❛❜❝❞〝〞〟"]";
	private static readonly Regex rgxGetHex = new Regex(strRegGetHexNumber, RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
	public static void Main()
	{
		string unicodeText = "UTF-16 (hex)	0x0023 (0023)";
		//https://en.wikipedia.org/wiki/Hexadecimal --remove possible prefixes
		string prefixfree = string.Empty;
		string[] prefixHexs = new string[]{"0x", "\\u", "#", "&#", "\\x", "\\s", "U+", "X'", "16#", "#x", "#16r", "&H", "0h"};
		foreach (var pre in prefixHexs)
		{
			if (unicodeText.IndexOf(pre) > -1)
			{
				prefixfree = unicodeText.Substring(unicodeText.IndexOf(pre) + pre.Length);
				break;
			}
		}

		if (string.IsNullOrEmpty(prefixfree))
			prefixfree = unicodeText;
		Match m = rgxGetHex.Match(prefixfree);
		bool success = false;
		ulong number = 0;
		string hex_value = string.Empty;
		if (m.Success)
		{
			try
			{
				hex_value = m.Value;
				//long number = Convert.ToInt64(hex_value, 16); //base 16 - hex....
				//https://stackoverflow.com/questions/2801509/uint32-tryparse-hex-number-not-working -> remove prefixes
				success = ulong.TryParse(hex_value, NumberStyles.HexNumber, CultureInfo.InvariantCulture, out number);
			}
			catch (FormatException)
			{
				unicodeText = string.Format("{0} is not in the correct format for a hexadecimal number.", m.Value);
			}
			catch (OverflowException)
			{
				unicodeText = string.Format("{0} is outside the range of an Int64.", m.Value);
			}
			catch (ArgumentException)
			{
				unicodeText = string.Format("{0} is invalid in base 16.", m.Value);
			}
			catch (Exception ex)
			{
				unicodeText = string.Format("{0} return error\r\n{2}", m.Value, ex.Message);
			}
		}
		else
			unicodeText = "Could not find a hex number in \"" + unicodeText + "\". Select the hex number only.";
		if (success)
			unicodeText = string.Format("{0} from found {1:N0} hex number in string {2}", number, hex_value, unicodeText);
		else
			unicodeText = "Could not find a hex number in \"" + unicodeText + "\". Select the hex number only.";
		Console.WriteLine(unicodeText);
	}
}

Wednesday, September 23, 2020

C# .NET How to get Animation, Alpha, EXIF, ICC flags from header of WebP image file - partially loaded file

 The order to consume the header flags was clearly laid out in the Webp RIFF container specification, but getting the bits was tricky. Since even taking little-endian into account and reversing the bit values in the byte, which then aligning the indices to the spec, it still failed. Turns out it was the bit shifting math is using little-endian mode. Therefore, the flags are read in reverse order. 

Blade Runner.webp animated



Source Code


using System;
using System.Net; 
using System.Collections.Generic;
using System.IO;
using System.Drawing;
using System.Linq;
using System.Diagnostics; 

public class Program
{
    //https://stackoverflow.com/questions/111345/getting-image-dimensions-without-reading-the-entire-file/60667939#60667939
    //DecodeWebP reads only lossless :( 
    
    //My version improves DecodeWebP to read all webp formats, lossy, lossless and extended! 
    //https://metadataconsulting.blogspot.com/2020/09/CSharp-dotNET-How-to-get-Animation-Alpha-EXIF-ICC-flags-from-header-of-WebP-image-file-partially-loaded-file.html
                  
    internal static class ImageHelper
    {
        const string errorMessage = "Could not recognise image format.";

        private static Dictionary<byte[], Func<BinaryReader, Size>> imageFormatDecoders = new Dictionary<byte[], Func<BinaryReader, Size>>()
        {
            { new byte[] { 0x42, 0x4D }, DecodeBitmap },
            { new byte[] { 0x47, 0x49, 0x46, 0x38, 0x37, 0x61 }, DecodeGif },
            { new byte[] { 0x47, 0x49, 0x46, 0x38, 0x39, 0x61 }, DecodeGif },
            { new byte[] { 0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A }, DecodePng },
            { new byte[] { 0xff, 0xd8 }, DecodeJfif },
            { new byte[] { 0x52, 0x49, 0x46, 0x46 }, DecodeWebP },
        };

        /// <summary>        
        /// Gets the dimensions of an image.        
        /// </summary>        
        /// <param name="path">The path of the image to get the dimensions of.</param>        
        /// <returns>The dimensions of the specified image.</returns>        
        /// <exception cref="ArgumentException">The image was of an unrecognised format.</exception>            
        public static Size GetDimensions(BinaryReader binaryReader)
        {
            int maxMagicBytesLength = imageFormatDecoders.Keys.OrderByDescending(x => x.Length).First().Length;
            byte[] magicBytes = new byte[maxMagicBytesLength];
            for (int i = 0; i < maxMagicBytesLength; i += 1)
            {
                magicBytes[i] = binaryReader.ReadByte();
                foreach (var kvPair in imageFormatDecoders)
                {
                    if (StartsWith(magicBytes, kvPair.Key))
                    {
                        Console.WriteLine(kvPair.Value.Method);
                        return kvPair.Value(binaryReader);
                        
                    }
                }
            }

            throw new ArgumentException(errorMessage, "binaryReader");
        }
        
        /// <summary>
        /// Gets the dimensions of an image.
        /// </summary>
        /// <param name="path">The path of the image to get the dimensions of.</param>
        /// <returns>The dimensions of the specified image.</returns>
        /// <exception cref="ArgumentException">The image was of an unrecognized format.</exception>
        public static Size GetDimensions(string path)
        {
            using (BinaryReader binaryReader = new BinaryReader(File.OpenRead(path)))
            {
                try
                {
                    return GetDimensions(binaryReader);
                }
                catch (ArgumentException e)
                {
                    if (e.Message.StartsWith(errorMessage))
                    {
                        throw new ArgumentException(errorMessage, "path", e);
                    }
                    else
                    {
                        throw e;
                    }
                }
            }
        }
        
         /// <summary>
        /// Gets the dimensions of an image.
        /// </summary>
        /// <param name="path">The path of the image to get the dimensions of.</param>
        /// <returns>The dimensions of the specified image.</returns>
        /// <exception cref="ArgumentException">The image was of an unrecognized format.</exception>
        public static Size GetDimensions(MemoryStream ms)
        {
            using (BinaryReader binaryReader = new BinaryReader(ms))
            {
                try
                {
                    return GetDimensions(binaryReader);
                }
                catch (ArgumentException e)
                {
                    if (e.Message.StartsWith(errorMessage))
                    {
                        throw new ArgumentException(errorMessage, "path", e);
                    }
                    else
                    {
                        throw e;
                    }
                }
            }
        }


        private static bool StartsWith(byte[] thisBytes, byte[] thatBytes)
        {
            for (int i = 0; i < thatBytes.Length; i += 1)
            {
                if (thisBytes[i] != thatBytes[i])
                {
                    return false;
                }
            }

            return true;
        }

        private static short ReadLittleEndianInt16(BinaryReader binaryReader)
        {
            byte[] bytes = new byte[sizeof(short)];

            for (int i = 0; i < sizeof(short); i += 1)
            {
                bytes[sizeof(short) - 1 - i] = binaryReader.ReadByte();
            }
            return BitConverter.ToInt16(bytes, 0);
        }

        private static int ReadLittleEndianInt32(BinaryReader binaryReader)
        {
            byte[] bytes = new byte[sizeof(int)];
            for (int i = 0; i < sizeof(int); i += 1)
            {
                bytes[sizeof(int) - 1 - i] = binaryReader.ReadByte();
            }
            return BitConverter.ToInt32(bytes, 0);
        }

        private static Size DecodeBitmap(BinaryReader binaryReader)
        {
            binaryReader.ReadBytes(16);
            int width = binaryReader.ReadInt32();
            int height = binaryReader.ReadInt32();
            return new Size(width, height);
        }

        private static Size DecodeGif(BinaryReader binaryReader)
        {
            int width = binaryReader.ReadInt16();
            int height = binaryReader.ReadInt16();
            return new Size(width, height);
        }

        private static Size DecodePng(BinaryReader binaryReader)
        {
            binaryReader.ReadBytes(8);
            int width = ReadLittleEndianInt32(binaryReader);
            int height = ReadLittleEndianInt32(binaryReader);
            return new Size(width, height);
        }

        private static Size DecodeJfif(BinaryReader binaryReader)
        {
            while (binaryReader.ReadByte() == 0xff)
            {
                byte marker = binaryReader.ReadByte();
                short chunkLength = ReadLittleEndianInt16(binaryReader);
                if (marker == 0xc0 || marker == 0xc2) // c2: progressive
                {
                    binaryReader.ReadByte();
                    int height = ReadLittleEndianInt16(binaryReader);
                    int width = ReadLittleEndianInt16(binaryReader);
                    return new Size(width, height);
                }

                if (chunkLength < 0)
                {
                    ushort uchunkLength = (ushort)chunkLength;
                    binaryReader.ReadBytes(uchunkLength - 2);
                }
                else
                {
                    binaryReader.ReadBytes(chunkLength - 2);
                }
            }

            throw new ArgumentException(errorMessage);
        }
        
        //Other libs did not do it 
        //https://github.com/JosePineiro/WebP-wrapper/blob/master/WebPTest/WebPWrapper.cs
        //https://github.com/JimBobSquarePants/ImageProcessor/blob/6092da59e9aa4975e564002ef3c782a8f6bf3384/src/Plugins/ImageProcessor/ImageProcessor.Plugins.WebP/Imaging/Formats/WebPFormat.cs
      
        //fast
        private static Size DecodeWebP(BinaryReader binaryReader)
        {
            //https://developers.google.com/speed/webp/docs/riff_container
            //var riffseg = binaryReader.ReadBytes(4); //already offset 4 bytes 
            //var sizebytes = binaryReader.ReadBytes(4); // Size

            var size = binaryReader.ReadUInt32(); // Size - start at offset 4 
            
            var webp = binaryReader.ReadBytes(4); // start 8 offset

            var type = binaryReader.ReadBytes(4); // start 12 offset - VP8[ ] determination

            string VP8Type = System.Text.Encoding.UTF8.GetString(type);
            
            Console.WriteLine("VP8Type=\""+VP8Type+"\""); 

            int x = 0; 
            int y = 0;

            if (VP8Type == "VP8X") //Extra format - https://developers.google.com/speed/webp/docs/riff_container#extended_file_format
            { 
                
                binaryReader.ReadBytes(4); //skip 4 bytes
                
                byte Flags = binaryReader.ReadByte(); //read 1 byte for all header bit bFlags

                ////////////////////////////////////////////
                ///////// LITTLE INDIAN MATH! REVERSE ORDER - https://developers.google.com/speed/webp/docs/riff_container#extended_file_format
                ////////////////////////////////////////////
                
                int bitPos = 0; //bit position
                //var bitR1C = (((Flags>> bitPos) & 1) != 0); //Originally attempt at order 
                bool bitR3C = (((Flags>> bitPos) & 1) != 0); 
                
                bitPos = 1; 
                //var bitR2C = (((Flags>> bitPos) & 1) != 0); //Originally attempt at order 
                var bitAniC = (((Flags>> bitPos) & 1) != 0); 
                Console.WriteLine("has Animantion ? {0}", bitAniC); 

                bitPos = 2; 
                //var bitICCC = (((Flags>> bitPos) & 1) != 0); //Originally attempt at order 
                var bitXMPC = (((Flags>> bitPos) & 1) != 0);
                Console.WriteLine("has XMP ? {0}", bitXMPC); 

                bitPos = 3;
                //var bitAlphaC = (Flags>> bitPos) & 1; //Originally attempt at order 
                bool bitExifC = (((Flags>> bitPos) & 1) != 0); 
                Console.WriteLine("has EXIF ? {0}", bitExifC); 
                
                bitPos = 4;
                //var bitExifC = (((Flags>> bitPos) & 1) != 0); //Originally attempt at order 
                var bitAlphaC = (((Flags>> bitPos) & 1) != 0);
                Console.WriteLine("has Alpha ? {0}", bitAlphaC); 
                
                bitPos = 5;
                //var bitXMPC = (((Flags>> bitPos) & 1) != 0);//Originally attempt at order 
                var bitICCC = (((Flags>> bitPos) & 1) != 0); 
                Console.WriteLine("has ICC ? {0}", bitAniC); 
                
                bitPos = 6;
                //var bitAniC = (((Flags>> bitPos) & 1) != 0); //Originally attempt at order 
                var bitR2C = (((Flags>> bitPos) & 1) != 0); 

                bitPos = 7;
                //var bitR3C = (Flags>> bitPos) & 1; //Originally attempt at order 
                var bitR1C = (((Flags>> bitPos) & 1) != 0); 

                binaryReader.ReadBytes(3); 

                byte[] w = binaryReader.ReadBytes(3); //24bits for width

                x = 1 + (w[2] << 16 | w[1] << 8 | w[0]); //little endian

                byte[] h = binaryReader.ReadBytes(3); //24bits for height

                y = 1 + (h[2] << 16 | h[1] << 8 | h[0]); 

                return new Size(x, y);

            }
            else if (VP8Type == "VP8L") //Lossless - https://developers.google.com/speed/webp/docs/webp_lossless_bitstream_specification#2_riff_header 
            {
                
                binaryReader.ReadBytes(4); //size
                byte[] sig = binaryReader.ReadBytes(1); //0x2f->47 1 byte signature
                if (sig[0] != 47) new Size(0, 0); 

                byte[] wh = binaryReader.ReadBytes(4); //width and height in 1 read
                x = 1 + (((wh[1] & 0x3F) << 8) | wh[0]); //{1 + ((($b1 & 0x3F) << 8) | $b0)} - https://blog.tcl.tk/38137  
                y = 1 + (((wh[3] & 0xF) << 10) | (wh[2] << 2) | ((wh[1] & 0xC0) >> 6)); //{1 + ((($b3 & 0xF) << 10) | ($b2 << 2) | (($b1 & 0xC0) >> 6))}]

                return new Size(x, y);

            }
            else if (VP8Type == "VP8 ") //Lossy - https://tools.ietf.org/html/rfc6386#section-9.1
            {

                //Lossy - https://tools.ietf.org/html/rfc6386#section-9.1 hard to decipher
                //pc->Width      = swap2(*(unsigned short*)(c+3))&0x3fff;  0x3fff -> 16383 decimal  swap2 - big or little indian depending on machine 
                //pc->Height     = swap2(*(unsigned short*)(c+5))&0x3fff;
              
                //https://blog.tcl.tk/38137 - much better
                  
                binaryReader.ReadBytes(7); //move to offset 23 or 0x17, 23-12+4=7 - open webp lossy file https://developers.google.com/speed/webp/gallery1

                byte[] frameTag = binaryReader.ReadBytes(3); //$b0 != 0x9d->157  || $b1 != 0x01>1 || $b2 != 0x2a->  42 
                if (frameTag[0] != 157 && frameTag[0] != 1 && frameTag[0] != 42) return new Size(0, 0); //invalid webp file

                //reads 2-bytes which is 16-bits, but we want only 14bits, so and it to 14 bits
                x = binaryReader.ReadUInt16() & 0x3fff;    //$width & 0x3fff -> & 0b00_11111111111111 c#7.0 above only
                y = binaryReader.ReadUInt16() & 0x3fff;    //$height & 0x3fff
            
                return new Size(x, y);
            }
            
            return new Size(0, 0);
            
        }

    }
    
    public static string GetFileNameFromURL(string hrefLink) //hack
    {
        string[] parts = hrefLink.Split('/');
        string fileName = string.Empty;

        if (parts.Length > 0)
            fileName = parts[parts.Length - 1];
        else
            fileName = hrefLink;

        return fileName;
    }

    public static void Main()
    {
        Stopwatch sw = new Stopwatch(); 
        //string webpURL = "https://www.gstatic.com/webp/gallery/1.sm.webp";//Lossy
        //string webpURL = "https://www.gstatic.com/webp/gallery3/2_webp_ll.webp";//Lossless
        string webpURL = "https://www.gstatic.com/webp/gallery3/1_webp_a.webp";//Extended with alpha channel
        //string webpURL = "https://mathiasbynens.be/demo/animated-webp-supported.webp"; //animated
        //string webpURL = "http://blog.mindworkshop.com/image/webp003.webp"; //animal
        //string webpURL = "https://1.bp.blogspot.com/-rfUYu37r2No/X2uNWNMz8jI/AAAAAAAAMps/mJow8dGNE80qvWJupR1KEV8RCXo49FKUQCNcBGAsYHQ/w640-h287/BladeRunner.webp";
        
        //string webpURL = "https://res.cloudinary.com/demo/image/upload/fl_awebp/cell_animation.webp";
        string webpfile = GetFileNameFromURL(webpURL); 
        
        Size webpSize = new Size(); 
        
        WebClient wc = new WebClient();
        using (MemoryStream stream = new MemoryStream(wc.DownloadData(webpURL)))
        {
         sw.Start(); 
         webpSize = ImageHelper.GetDimensions(stream); 
         sw.Stop(); 
            
        }
        Console.WriteLine("File \"{0}\" has dimensions [{1}w X {2}h] in {3} ms.", webpfile, webpSize.Width, webpSize.Height, sw.ElapsedMilliseconds);
        
    }
}

Saturday, September 5, 2020

C# .NET - How to get a string between two strings fast

Here's an indulgent look into getting a string between 2 strings. I could not find any satisfactory* solutions out there so I did my own. 

*Satisfactory because all the solution out there generally were not taking care of edge cases and did not state if there were maximal spanning (greedy in regex terms) or minimal spanning. 

So having a little extra time I decided to really look at this. I learnt that some edge cases I did not need to optimize this and make it fast, but bullet proof.


Note: This solution gets the maximal span between two strings. So 1st is search from beginning, and 2nd string is search from end.  
Minimal span would be 2nd string would be search from a position after 1st. 

See my post on minimal spanning 
https://metadataconsulting.blogspot.com/2020/09/C-NET-How-to-get-a-string-between-two-strings-minimal-spanning-fast.html




Code below implement a debug version of the method which shows which rules get triggered, in an extensive test cases.

  
Source Code
 
using System;using System.Diagnostics;

namespace GetBetweenStrings_Blog
{
    public static class Extensions 
    {
        /// <summary>
        /// Get a substring between two anchor strings, maximal span
        /// </summary>
        /// <param name="s">source string</param>
        /// <param name="from">search from end of this string</param>
        /// <param name="to">to beginning of this string, searching backwards, from end to start of s</param>
        /// <returns>a substring between from and to, maximal span</returns>
        public static string GetFirstStringBetweenStringsCleanup(this string s, string from, string to)
        {
            if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(from) || string.IsNullOrEmpty(to)) return string.Empty;
        
            int idxFrom = s.IndexOf(from);
            int idxStart = idxFrom + from.Length; //we filter "not found" -1, never get neg number here

            if (idxFrom == -1 || idxStart >= s.Length - 1)
			    return string.Empty;
            
            int idxEnd = s.LastIndexOf(to);
            
            if (idxEnd == -1 || idxEnd <= idxStart) 
                return string.Empty;
            
            return s.Substring(idxStart, idxEnd - idxStart);

        }

		//4:00AM coding journey, testing conditions you may or may not need, buiding from scatch
        /// <summary>
        /// Get a substring between two anchor strings, maximal span
        /// </summary>
        /// <param name="s">source string</param>
        /// <param name="from">search from end of this string</param>
        /// <param name="to">to beginning of this string, searching backwards, from end to start of s</param>
        /// <returns>a substring between from and to, maximal span</returns>
        public static string GetFirstStringBetweenStrings(this string s, string from, string to)
        {
            Console.Write("args from {0} to {1} = ", from, to); //debug

            //edge case
            if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(from) || string.IsNullOrEmpty(to)) return string.Empty;
            //if (from.Length >= s.Length || to.Length >= s.Length) return string.Empty; //redundant rule take care of below
            
            int idxFrom = s.IndexOf(from);
            int idxStart = idxFrom + from.Length; //we filter "not found" -1, never race condtn

            if (idxFrom == -1)       
                return string.Empty;
            else if (idxStart >= s.Length - 1) //for testing combine to 1 line, we combining a idx with a length, and w/ lengths we normally subtract 1 
            //else if (idxStart >= s.Length) //for testing combine to 1 line, we combining a idx with a length, and w/ lengths we normally subtract 1 
            {
                Console.WriteLine("r1. idxStart={0} >= (s.Length - 1)={1}", idxStart, s.Length - 1);
                //Console.WriteLine("r1. idxStart={0} >= s.Length={1}",  idxStart, s.Length);
                return string.Empty; 
            }

         

            int idxEnd = s.LastIndexOf(to); 
            //int idxEnd = s.LastIndexOf(to, idxStart); //produces unexpected results 1 when to="a" and idxStart=1 when from="a", we expect last last index of string "aaaa" which is 4-1=3

            //if (idxEnd == -1 || idxEnd > s.Length - 1 || idxStart >= idxEnd) //intially in-correction
            if (idxEnd == -1 )
                return string.Empty;
            //else if (idxEnd > s.Length - 1) //we can rule this out because if not found -1
            ////else if (idxEnd > s.Length)
            //{
            //    Console.WriteLine("r2. idxEnd={0} > (s.Length - 1)={1}", idxEnd, s.Length - 1);
            //    //Console.WriteLine("r2. idxEnd={0} > s.Length={1}", idxEnd, s.Length);
            //    return string.Empty;
            //}
            else if (idxEnd <= idxStart)
            {
                Console.WriteLine("r3. idxEnd={0} <= idxStart={1}", idxEnd, idxStart);
                return string.Empty;
            }
            
            return s.Substring(idxStart, idxEnd - idxStart);
            
        }


    }
    public class Program
    {
        public static void Main(string[] args)
        {
            string test = "abcd";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStrings(null, null));
            Console.WriteLine(test.GetFirstStringBetweenStrings("", ""));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, test));
            Console.WriteLine(test.GetFirstStringBetweenStrings("", test));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test,""));
            
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "d"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "e"));
            
            Console.WriteLine(test.GetFirstStringBetweenStrings("e", "e"));
            
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "d"));
            
            

            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("a","abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("b", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("c", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("d", "abc"));

            Console.WriteLine(); 
            test = "abcdabcd";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStrings(null, null));
            Console.WriteLine(test.GetFirstStringBetweenStrings("", ""));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, test));
            Console.WriteLine(test.GetFirstStringBetweenStrings("", test));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, ""));

            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "d"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "e"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("e", "e"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("b", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("c", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("d", "abc"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "d")); //pass


            Console.WriteLine();
            test = "aaaa";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("aa", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("aaa", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("aaaa", "a"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "aa"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "aaa"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "aaaa"));

            Console.WriteLine();
            test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaabbbb";
            Console.WriteLine(test);
            Console.WriteLine("=================");
            Console.WriteLine(test.GetFirstStringBetweenStrings("bbbb", "aaaa"));
			
			string result = string.Empty; 
            Stopwatch sw = new Stopwatch();
            sw.Start();
            result = test.GetFirstStringBetweenStringsCleanup("aaaaaaa", "bb");
            sw.Stop();
            Console.WriteLine(result);
            Console.WriteLine(" in " + sw.ElapsedTicks + " ticks."); 
           
        }
    }
}