Pages

Wednesday, November 18, 2020

C# .NET How to get the literal .NET string, the internal storage representation of a C# string

From my last post we were dealing with the are gotcha's introduced when processing null character ('\0') which you might incorrectly deduce is a white-space character.

But in fact, in C# '\u0000' is a null character, but has no special meaning in C#. It just a null character in a string. It is considered not white-space but a control character. It's not considered a null either or string terminator as it is in C. To view control characters in VS Code, see my blog post and in Notepad++ see post.

In fact, you can look a the .NET internal storage of a string using the following code, and it rather expensive and slow to do. But in case you need it, especially when dealing with registry values that are riddled with '\0' you need it. 

  
Source Code

using System;
using System.Diagnostics;
using System.IO; 
using System.CodeDom;
using System.CodeDom.Compiler; 
					
public static class Program
{
	public static string ToLiteral(this string input)
	{
		using (var writer = new StringWriter())
		{
			using (var provider = CodeDomProvider.CreateProvider("CSharp"))
			{
				provider.GenerateCodeFromExpression(new CodePrimitiveExpression(input), writer, null);

				return writer.ToString();
			}
		}
	}

	public static void Main()
	{
		 Stopwatch sw = new Stopwatch();

            string emptytest = "1 \0 \0 \u0001 \0 \0\r\n\tSOS\n1 0 0 1 0 0 1\r\nIn distress\r\n1 0 0 1 0 0 1\r\nUnicode \\u0081 on the next line\n\n\u0081";
            sw.Start();
            string output = emptytest.ToLiteral(); 
            sw.Stop();
		
        Console.WriteLine(emptytest);    
		Console.WriteLine("---- Literal -----");    
		Console.WriteLine(output + " in " + sw.ElapsedTicks + " ticks");

	}
}

When do you use it, leave a comment below.

No comments:

Post a Comment