Pages

Wednesday, September 9, 2020

C# .NET - How to get a string between two strings, minimal spanning, fast

Here's an implementation to get a string between two strings fast, which meets most  expectations what this function should return. It finds the sub-string found "after" the 1st string "before" the next 2nd string. The sticky question is where does before 2nd string exactly start?


Technically, this minimal spanning meaning the string returned is the minimal length found string between the 1st string end and the beginning of the 2nd string. 

See my post on maximally spanning 
https://metadataconsulting.blogspot.com/2020/09/CSharp-dotNET-How-to-get-a-string-between-two-strings.html




Below we compare the 2nd string starting immediately after leaving no space after 1st, versus, 2nd string starting immediately after 1st but starting +1 character or next character after. We want at least one character to return, not empty string. Right.

Immediate after is how most examples are commonly implemented, but it may not be what you want, see comparison image below. Quick note, r stands for rule, see code.








With index immediately after "Exact in charts", getting string between strings "a" to "a" for input "aaaa" is empty! Technically correct. 

With index immediately after "+1 in charts", getting string between strings "a" to "a" for input "aaaa" is "a". Intuitively you would want this. 

Whilst the "+1" seems good, there are repercussions, in some cases might not be want you expect. Best to see it for you own eyes, in 
comparison chart image below.





















To see full effects, run code below for "Exact" or Edit in .NET Fiddle
to try "+1" version. 



using System;using System.Diagnostics; 

public static class Extensions 
{
	/// <summary>
	/// Get a substring between two anchor strings, minimal span
	/// </summary>
	/// <param name="s">source string</param>
	/// <param name="from">search from end of this string</param>
	/// <param name="to">to beginning of this string, searching backwards, from end to start of s</param>
	/// <returns>a substring between from and to, maximal span</returns>
	public static string GetFirstStringBetweenStringsMinSpanCleanup(this string s, string from, string to)
	{
		if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(from) || string.IsNullOrEmpty(to)) return string.Empty;

		int idxFrom = s.IndexOf(from);
		int idxStart = idxFrom + from.Length; //we filter "not found" -1, never race condtn

		if (idxFrom == -1 || idxStart >= s.Length - 1)
			return string.Empty;

		int idxEnd = s.IndexOf(to, idxStart); //Exact definition, but intuitively next line meets likely expectations -> YOU CHOOSE
        //int idxEnd = s.IndexOf(to, idxStart + 1); //Start next position after, leaving a space for 1 character to be returned

           
		if (idxEnd == -1 || idxEnd <= idxStart)
			return string.Empty;

		return s.Substring(idxStart, idxEnd - idxStart);

	}
	/// <summary>
	/// Get a substring between two anchor strings, MINIMAL SPAN
	/// </summary>
	/// <param name="s">source string</param>
	/// <param name="from">search from end of this string</param>
	/// <param name="to">to beginning of this string, searching backwards, from end to start of s</param>
	/// <returns>a substring between from and to, maximal span</returns>
	public static string GetFirstStringBetweenStringsMinSpan(this string s, string from, string to)
	{
		Console.Write("args from {0} to {1} = ", from, to); //debug

		//edge cases
		if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(from) || string.IsNullOrEmpty(to)) return string.Empty;
		//if (from.Length >= s.Length || to.Length >= s.Length) return string.Empty; //redundant, this Rule take care of below

		int idxFrom = s.IndexOf(from);
		int idxStart = idxFrom + from.Length; //we filter "not found" -1, never race condtn

		if (idxFrom == -1)       
			return string.Empty;
		else if (idxStart >= s.Length - 1) //for testing combine to 1 line, we combining a idx with a length, and w/ lengths we normally subtract 1 
		{
			Console.WriteLine("r1. idxStart={0} >= (s.Length - 1)={1}", idxStart, s.Length - 1);
			return string.Empty; 
		}

		int idxEnd = s.IndexOf(to, idxStart); //Exact definition, but intuitively next line meets likely expectations -> YOU CHOOSE
        //int idxEnd = s.IndexOf(to, idxStart + 1); //Start next position after, leaving a space for 1 character to be returned

		if (idxEnd == -1 )
			return string.Empty;
		else if (idxEnd > s.Length - 1) //Rule 2 never gets used
		{
			Console.WriteLine("r2. idxEnd={0} > (s.Length - 1)={1}", idxEnd, s.Length - 1);
			return string.Empty;
		}
		else if (idxEnd <= idxStart)
		{
			Console.WriteLine("r3. idxEnd={0} <= idxStart={1}", idxEnd, idxStart);
			return string.Empty;
		}

		return s.Substring(idxStart, idxEnd - idxStart);

	}
 }

public class Program
{
	public static void Main()
	{
		  string test = "abcd";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(null, null));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", ""));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, test));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", test));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test,""));
            
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "d"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "e"));
            
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("e", "e"));
            
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "d"));
            
            

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a","abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("b", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("c", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("d", "abc"));

            Console.WriteLine(); 
            test = "abcdabcd";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(null, null));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", ""));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, test));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", test));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, ""));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "d"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "e"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("e", "e"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "b"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "c"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "d"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("b", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("c", "abc"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("d", "abc"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "d")); //pass


            Console.WriteLine();
            test = "aaaa";
            Console.WriteLine(test);
            Console.WriteLine("=================");

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("aa", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("aaa", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("aaaa", "a"));

            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "aa"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "aaa"));
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "aaaa"));

            Console.WriteLine();
            test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaabbbb";
            Console.WriteLine(test);
            Console.WriteLine("=================");
            Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("bbbb", "aaaa"));

            string result = string.Empty; 
            Stopwatch sw = new Stopwatch();
            sw.Start();
            result = test.GetFirstStringBetweenStringsMinSpan("aaaaaaa", "bb");
            sw.Stop();
            Console.WriteLine(result);
            Console.WriteLine(" in " + sw.ElapsedTicks + " ticks."); 
	}
}

4 comments:

  1. Any call to string.SubString() is a possible code smell. It would be better to return a ReadOnlySpan (just replace SubString with AsSpan) and let the caller decide if they want to allocate a new string on the heap or not.

    ReplyDelete
    Replies
    1. https://docs.microsoft.com/en-us/dotnet/api/system.memoryextensions.asspan

      Available in 5.0 Preview 8.NET Core3.1 3.0 2.2 2.1

      Delete
    2. https://adamsitnik.com/Span/ - indepth
      https://www.codeproject.com/Articles/5269747/Using-Span-T-to-improve-performance-of-Csharp-code
      https://stackoverflow.com/questions/59596364/obtain-a-spanbyte-over-a-struct-without-making-a-copy-of-the-struct

      Delete