Tuesday, October 13, 2020

C# .NET How to get integer from hexadecimal string, many hex formats supported















The aim of this post is to parse a hexadecimal number from a string using C# code. 

Using C# recommended TryParse function using the NumberStyles.HexNumber flag simply fails on many hex formats.

For example, "0x20" will fail?  

UInt32.TryParse("0x20", 
             NumberStyles.HexNumber, // I've tried also AllowHexSpecifier
             CultureInfo.InvariantCulture,  // I've also tried CurrentCulture
             out number));


The problem with NumberStyles.HexNumber is that it doesn't support parsing values with a prefix (ie. 0x, &H, or #), so you have to strip it off before trying to parse the value.

So the code below removes many hex formats prefixes as listed here  - https://en.wikipedia.org/wiki/Hexadecimal to enable the conversion to work, 


As per feedback, I improved the algo to first quarantine likely regex hex matches with prefixes then run another regex to extract the hex within that match.



 
Source Code

using System;
using System.Globalization;
using System.Text.RegularExpressions;

public static class Program
{
	const string strRegGetHexNumber = @"\\u[0-9a-f]{2,}|0x[0-9a-f]{2,}|%[0-9a-f]{2,}|\u0023([0-9a-f]){1,6}|&\u0023x([0-9a-f]){1,6};|\s[0-9a-f]{2,}|[0-9a-f]{2,}|[«‹»›„‚“‟‘‛”’""""❛❜❝❞〝〞〟"""""'‘][0-9a-f]{2,}[’'""""«‹»›„‚“‟‘‛”’""""❛❜❝❞〝〞〟"]";
	private static readonly Regex rgxGetHex = new Regex(strRegGetHexNumber, RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
	public static void Main()
	{
		string unicodeText = "UTF-16 (hex)	0x0023 (0023)";
		//https://en.wikipedia.org/wiki/Hexadecimal --remove possible prefixes
		string prefixfree = string.Empty;
		string[] prefixHexs = new string[]{"0x", "\\u", "#", "&#", "\\x", "\\s", "U+", "X'", "16#", "#x", "#16r", "&H", "0h"};
		foreach (var pre in prefixHexs)
		{
			if (unicodeText.IndexOf(pre) > -1)
			{
				prefixfree = unicodeText.Substring(unicodeText.IndexOf(pre) + pre.Length);
				break;
			}
		}

		if (string.IsNullOrEmpty(prefixfree))
			prefixfree = unicodeText;
		Match m = rgxGetHex.Match(prefixfree);
		bool success = false;
		ulong number = 0;
		string hex_value = string.Empty;
		if (m.Success)
		{
			try
			{
				hex_value = m.Value;
				//long number = Convert.ToInt64(hex_value, 16); //base 16 - hex....
				//https://stackoverflow.com/questions/2801509/uint32-tryparse-hex-number-not-working -> remove prefixes
				success = ulong.TryParse(hex_value, NumberStyles.HexNumber, CultureInfo.InvariantCulture, out number);
			}
			catch (FormatException)
			{
				unicodeText = string.Format("{0} is not in the correct format for a hexadecimal number.", m.Value);
			}
			catch (OverflowException)
			{
				unicodeText = string.Format("{0} is outside the range of an Int64.", m.Value);
			}
			catch (ArgumentException)
			{
				unicodeText = string.Format("{0} is invalid in base 16.", m.Value);
			}
			catch (Exception ex)
			{
				unicodeText = string.Format("{0} return error\r\n{2}", m.Value, ex.Message);
			}
		}
		else
			unicodeText = "Could not find a hex number in \"" + unicodeText + "\". Select the hex number only.";
		if (success)
			unicodeText = string.Format("{0} from found {1:N0} hex number in string {2}", number, hex_value, unicodeText);
		else
			unicodeText = "Could not find a hex number in \"" + unicodeText + "\". Select the hex number only.";
		Console.WriteLine(unicodeText);
	}
}

No comments:

Post a Comment