Showing posts with label dotNET. Show all posts
Showing posts with label dotNET. Show all posts

Tuesday, January 24, 2023

C# NET How to remove ANSI Control Characters fast and reduce to ANSI set




Here's how to reduce a string to ANSI and remove control characters from a string fast in C-Sharp. But be careful since, remove é is not replaces with e. Todo that you need normalize the string, see UnicodetoAscii function. 

ASCII (American Standard Code for Information Interchange) is a 7-bit character set that contains characters from 0 to 127.

The generic term ANSI (American National Standards Institute) is used for 8-bit character sets. These character sets contain the unchanged ASCII character set. In addition, they contain further characters from 128 to 255.


Here's a list of control characters. https://unicode-table.com/en/blocks/general-punctuation/


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
using System; using System.Text; using System.Linq; using System.Diagnostics; 
					
public static class Program
{
	// Based on http://www.codeproject.com/Articles/13503/Stripping-Accents-from-Latin-Characters-A-Foray-in
	// Proper Normalization
	public static string UnicodeToANSI(this string inString)
	{
		var newStringBuilder = new StringBuilder();
		newStringBuilder.Append(inString.Normalize(NormalizationForm.FormKD)
								.Where(x => (x > 30 && x <= 255))
								.ToArray());
		return newStringBuilder.ToString();
	}
	
	//ANSI characters 32 to 127 correspond to those in the 7-bit ASCII character set,
	public static string ReducetoASCII(this string s)
    {
        StringBuilder sb = new StringBuilder(s.Length);
        foreach (char c in s)
        {
            if ((int)c > 255) // remove chars > 127
                continue;
            if ((int)c < 32)  // remove  control characters 
                continue;
            sb.Append(c);
        }
        return sb.ToString();
    }
	
	public static void Main()
	{
		Stopwatch sw = new Stopwatch(); 
		string french = "A Paris, le cortège parisien s’était élancé à 14 heures.\r\n\tFace à l’affluence, un «itinéraire bis» a été mis en place. D’importants rassemblements ont lieu à Bordeaux, Marseille, Rennes ou Lyon. Suivez la journée avec nos journalistes dans toute la France.";
		string ftemp = string.Empty; 
		string german = "ޘ Trump\t\r\nverwechselt Klägerin Carroll auf Foto mit Ex-Frau – das könnte Folgen haben"; 
		string gtemp = string.Empty; 
		Console.WriteLine(french); 
		
		sw.Start();
		ftemp = french.ReducetoASCII(); 
		sw.Stop(); 
		
		Console.WriteLine("Ansi reduced\r\n" + ftemp + " in " + sw.ElapsedTicks); 
		
		sw.Reset(); 
		sw.Start();
		ftemp = french.UnicodeToANSI(); 
		sw.Stop(); 
		
		Console.WriteLine("Proper Normalization\r\n" + ftemp + " in " + sw.ElapsedTicks); 
				
		Console.WriteLine();
		Console.WriteLine();
		Console.WriteLine(german); 
		
		sw.Reset();
		sw.Start();
		gtemp = german.ReducetoASCII(); 
		sw.Stop(); 
		
		Console.WriteLine("Ansi reduced\r\n" + gtemp + " in " + sw.ElapsedTicks); 
		
		sw.Reset(); 
		sw.Start();
		gtemp = german.UnicodeToANSI(); 
		sw.Stop(); 
		
		Console.WriteLine("Proper Normalization\r\n" + gtemp + " in " + sw.ElapsedTicks); 
		
	}
}

Saturday, September 5, 2020

C# .NET - How to get a string between two strings fast

Here's an indulgent look into getting a string between 2 strings. I could not find any satisfactory* solutions out there so I did my own. 

*Satisfactory because all the solution out there generally were not taking care of edge cases and did not state if there were maximal spanning (greedy in regex terms) or minimal spanning. 

So having a little extra time I decided to really look at this. I learnt that some edge cases I did not need to optimize this and make it fast, but bullet proof.


Note: This solution gets the maximal span between two strings. So 1st is search from beginning, and 2nd string is search from end.  
Minimal span would be 2nd string would be search from a position after 1st. 

See my post on minimal spanning 
https://metadataconsulting.blogspot.com/2020/09/C-NET-How-to-get-a-string-between-two-strings-minimal-spanning-fast.html




Code below implement a debug version of the method which shows which rules get triggered, in an extensive test cases.

  
Source Code
 
using System;using System.Diagnostics;

namespace GetBetweenStrings_Blog
{
    public static class Extensions 
    {
        /// <summary>
        /// Get a substring between two anchor strings, maximal span
        /// </summary>
        /// <param name="s">source string</param>
        /// <param name="from">search from end of this string</param>
        /// <param name="to">to beginning of this string, searching backwards, from end to start of s</param>
        /// <returns>a substring between from and to, maximal span</returns>
        public static string GetFirstStringBetweenStringsCleanup(this string s, string from, string to)
        {
            if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(from) || string.IsNullOrEmpty(to)) return string.Empty;
        
            int idxFrom = s.IndexOf(from);
            int idxStart = idxFrom + from.Length; //we filter "not found" -1, never get neg number here

            if (idxFrom == -1 || idxStart >= s.Length - 1)
			    return string.Empty;
            
            int idxEnd = s.LastIndexOf(to);
            
            if (idxEnd == -1 || idxEnd <= idxStart) 
                return string.Empty;
            
            return s.Substring(idxStart, idxEnd - idxStart);

        }

		//4:00AM coding journey, testing conditions you may or may not need, buiding from scatch
        /// <summary>
        /// Get a substring between two anchor strings, maximal span
        /// </summary>
        /// <param name="s">source string</param>
        /// <param name="from">search from end of this string</param>
        /// <param name="to">to beginning of this string, searching backwards, from end to start of s</param>
        /// <returns>a substring between from and to, maximal span</returns>
        public static string GetFirstStringBetweenStrings(this string s, string from, string to)
        {
            Console.Write("args from {0} to {1} = ", from, to); //debug

            //edge case
            if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(from) || string.IsNullOrEmpty(to)) return string.Empty;
            //if (from.Length >= s.Length || to.Length >= s.Length) return string.Empty; //redundant rule take care of below
            
            int idxFrom = s.IndexOf(from);
            int idxStart = idxFrom + from.Length; //we filter "not found" -1, never race condtn

            if (idxFrom == -1)       
                return string.Empty;
            else if (idxStart >= s.Length - 1) //for testing combine to 1 line, we combining a idx with a length, and w/ lengths we normally subtract 1 
            //else if (idxStart >= s.Length) //for testing combine to 1 line, we combining a idx with a length, and w/ lengths we normally subtract 1 
            {
                Console.WriteLine("r1. idxStart={0} >= (s.Length - 1)={1}", idxStart, s.Length - 1);
                //Console.WriteLine("r1. idxStart={0} >= s.Length={1}",  idxStart, s.Length);
                return string.Empty; 
            }

         

            int idxEnd = s.LastIndexOf(to); 
            //int idxEnd = s.LastIndexOf(to, idxStart); //produces unexpected results 1 when to="a" and idxStart=1 when from="a", we expect last last index of string "aaaa" which is 4-1=3

            //if (idxEnd == -1 || idxEnd > s.Length - 1 || idxStart >= idxEnd) //intially in-correction
            if (idxEnd == -1 )
                return string.Empty;
            //else if (idxEnd > s.Length - 1) //we can rule this out because if not found -1
            ////else if (idxEnd > s.Length)
            //{
            //    Console.WriteLine("r2. idxEnd={0} > (s.Length - 1)={1}", idxEnd, s.Length - 1);
            //    //Console.WriteLine("r2. idxEnd={0} > s.Length={1}", idxEnd, s.Length);
            //    return string.Empty;
            //}
            else if (idxEnd <= idxStart)
            {
                Console.WriteLine("r3. idxEnd={0} <= idxStart={1}", idxEnd, idxStart);
                return string.Empty;
            }
            
            return s.Substring(idxStart, idxEnd - idxStart);
            
        }


    }
    public class Program
    {
        public static void Main(string[] args)
        {
            string test = "abcd";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStrings(null, null));
            Console.WriteLine(test.GetFirstStringBetweenStrings("", ""));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, test));
            Console.WriteLine(test.GetFirstStringBetweenStrings("", test));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test,""));
            
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "d"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "e"));
            
            Console.WriteLine(test.GetFirstStringBetweenStrings("e", "e"));
            
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "d"));
            
            

            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("a","abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("b", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("c", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("d", "abc"));

            Console.WriteLine(); 
            test = "abcdabcd";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStrings(null, null));
            Console.WriteLine(test.GetFirstStringBetweenStrings("", ""));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, test));
            Console.WriteLine(test.GetFirstStringBetweenStrings("", test));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, ""));

            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "d"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "e"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("e", "e"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "b"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "c"));
            Console.WriteLine(test.GetFirstStringBetweenStrings(test, "d"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("b", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("c", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("d", "abc"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("abc", "d")); //pass


            Console.WriteLine();
            test = "aaaa";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("aa", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("aaa", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("aaaa", "a"));

            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "aa"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "aaa"));
            Console.WriteLine(test.GetFirstStringBetweenStrings("a", "aaaa"));

            Console.WriteLine();
            test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaabbbb";
            Console.WriteLine(test);
            Console.WriteLine("=================");
            Console.WriteLine(test.GetFirstStringBetweenStrings("bbbb", "aaaa"));
			
			string result = string.Empty; 
            Stopwatch sw = new Stopwatch();
            sw.Start();
            result = test.GetFirstStringBetweenStringsCleanup("aaaaaaa", "bb");
            sw.Stop();
            Console.WriteLine(result);
            Console.WriteLine(" in " + sw.ElapsedTicks + " ticks."); 
           
        }
    }
}

Wednesday, August 12, 2020

C# .NET How to overload clipboard to handle multiple formats simultaneously




















Tip! Get ClipSpy - CodeProject to examine the clipboard overloads.  


The following console application is fully functional working code to demonstrate how to overload Windows system clipboard with multiple
formats

In below example source code specifically, we are overloading a image copy to append a caption.

Result: 

When you copy an image from a browser, and paste into MSPaint, you'll get the expected image.
When you paste into Notepad, you'll get a text message.

This is the overload in action, an additional text format has been added. 

Aside : See it in action with my tool - https://clipboardplaintextpowertool.blogspot.com/  which overloads images with following text metadata; 

Clip iss063e070805.jpg [1041w✕585h] "NASA Image of the Day | NASA - Microsoft​ Edge", img src="https://www.nasa.gov/sites/default/files/styles/full_width_feature/public/thumbnails/image/iss063e070805.jpg"

Update - Simplified further Aug 13, 2020.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Text;
using System.Windows.Forms;
using System.Drawing.Imaging;
using System.Drawing;

namespace ClipboardOverloadExample
{
    class Program
    {
        [STAThread] //TIP! Set this to get clipboard handle from a console app
        public static void Main(string[] args)
        {
            Console.WriteLine("Right-click COPY a image from your open brower. Press any key to start...");
            Console.ReadKey();

            //Let's grab clipboard, do we have access and does it contain data?
            System.Windows.Forms.IDataObject iData = Clipboard.GetDataObject();

            if (iData == null)
                return;
            
            //check if we have a standard Bitmap format on the clipboard
            //images are saved on the clipboard as raw bitmap "Memory Bitmap",but transparency maybe lost.
            if (!iData.GetDataPresent(DataFormats.Bitmap))
                return;

            //we have Bitmap format and thus we can perform cast safely
            Bitmap clipBMP = iData.GetData(DataFormats.Bitmap) as Bitmap;

            if (clipBMP == null)
                return;

            string imgInfo = "This image is a Bitmap " + clipBMP.PixelFormat.ToString();

            //Create an object that will contain multiple formats to paste - overload
            System.Windows.Forms.IDataObject objFormatstoPaste = new DataObject();

            //add formats to put on clipboard
            objFormatstoPaste.SetData(DataFormats.Text, imgInfo);
            objFormatstoPaste.SetData(DataFormats.Bitmap, clipBMP);

            //Copy to the clipboard, and the 2nd parameter indicates that the clipboard is not cleared when the program exits
            Clipboard.SetDataObject(objFormatstoPaste, true);
            
            Console.WriteLine("Success. You can exit program, but to test overload of clipboard, do the following:");
            Console.WriteLine("Open Notepad and paste (CTRL-V), you'll get text. Open MSPaint and you'll paste the image!");
            Console.ReadKey();

        }
    }
}