Here's an implementation to get a string between two strings fast, which meets most expectations what this function should return. It finds the sub-string found "after" the 1st string "before" the next 2nd string. The sticky question is where does before 2nd string exactly start?
Technically, this minimal spanning meaning the string returned is the minimal length found string between the 1st string end and the beginning of the 2nd string.
See my post on maximally spanning
https://metadataconsulting.blogspot.com/2020/09/CSharp-dotNET-How-to-get-a-string-between-two-strings.html
Below we compare the 2nd string starting immediately after leaving no space after 1st, versus, 2nd string starting immediately after 1st but starting +1 character or next character after. We want at least one character to return, not empty string. Right.
Immediate after is how most examples are commonly implemented, but it may not be what you want, see comparison image below. Quick note, r stands for rule, see code.
With index immediately after "Exact in charts", getting string between strings "a" to "a" for input "aaaa" is empty! Technically correct.
With index immediately after "+1 in charts", getting string between strings "a" to "a" for input "aaaa" is "a". Intuitively you would want this.
Whilst the "+1" seems good, there are repercussions, in some cases might not be want you expect. Best to see it for you own eyes, in comparison chart image below.
To see full effects, run code below for "Exact" or Edit in .NET Fiddle
to try "+1" version.
using System;using System.Diagnostics; public static class Extensions { /// <summary> /// Get a substring between two anchor strings, minimal span /// </summary> /// <param name="s">source string</param> /// <param name="from">search from end of this string</param> /// <param name="to">to beginning of this string, searching backwards, from end to start of s</param> /// <returns>a substring between from and to, maximal span</returns> public static string GetFirstStringBetweenStringsMinSpanCleanup(this string s, string from, string to) { if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(from) || string.IsNullOrEmpty(to)) return string.Empty; int idxFrom = s.IndexOf(from); int idxStart = idxFrom + from.Length; //we filter "not found" -1, never race condtn if (idxFrom == -1 || idxStart >= s.Length - 1) return string.Empty; int idxEnd = s.IndexOf(to, idxStart); //Exact definition, but intuitively next line meets likely expectations -> YOU CHOOSE //int idxEnd = s.IndexOf(to, idxStart + 1); //Start next position after, leaving a space for 1 character to be returned if (idxEnd == -1 || idxEnd <= idxStart) return string.Empty; return s.Substring(idxStart, idxEnd - idxStart); } /// <summary> /// Get a substring between two anchor strings, MINIMAL SPAN /// </summary> /// <param name="s">source string</param> /// <param name="from">search from end of this string</param> /// <param name="to">to beginning of this string, searching backwards, from end to start of s</param> /// <returns>a substring between from and to, maximal span</returns> public static string GetFirstStringBetweenStringsMinSpan(this string s, string from, string to) { Console.Write("args from {0} to {1} = ", from, to); //debug //edge cases if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(from) || string.IsNullOrEmpty(to)) return string.Empty; //if (from.Length >= s.Length || to.Length >= s.Length) return string.Empty; //redundant, this Rule take care of below int idxFrom = s.IndexOf(from); int idxStart = idxFrom + from.Length; //we filter "not found" -1, never race condtn if (idxFrom == -1) return string.Empty; else if (idxStart >= s.Length - 1) //for testing combine to 1 line, we combining a idx with a length, and w/ lengths we normally subtract 1 { Console.WriteLine("r1. idxStart={0} >= (s.Length - 1)={1}", idxStart, s.Length - 1); return string.Empty; } int idxEnd = s.IndexOf(to, idxStart); //Exact definition, but intuitively next line meets likely expectations -> YOU CHOOSE //int idxEnd = s.IndexOf(to, idxStart + 1); //Start next position after, leaving a space for 1 character to be returned if (idxEnd == -1 ) return string.Empty; else if (idxEnd > s.Length - 1) //Rule 2 never gets used { Console.WriteLine("r2. idxEnd={0} > (s.Length - 1)={1}", idxEnd, s.Length - 1); return string.Empty; } else if (idxEnd <= idxStart) { Console.WriteLine("r3. idxEnd={0} <= idxStart={1}", idxEnd, idxStart); return string.Empty; } return s.Substring(idxStart, idxEnd - idxStart); } } public class Program { public static void Main() { string test = "abcd"; Console.WriteLine(test); Console.WriteLine("================="); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(null, null)); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", "")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, test)); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", test)); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test,"")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "b")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "c")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "d")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "e")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("e", "e")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "b")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "c")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "d")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "b")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "c")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "d")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a","abc")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("b", "abc")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("c", "abc")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("d", "abc")); Console.WriteLine(); test = "abcdabcd"; Console.WriteLine(test); Console.WriteLine("================="); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(null, null)); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", "")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, test)); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("", test)); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "b")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "c")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "d")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "e")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("e", "e")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "b")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "c")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("ab", "d")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "b")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "c")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "d")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "b")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "c")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan(test, "d")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "abc")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("b", "abc")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("c", "abc")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("d", "abc")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("abc", "d")); //pass Console.WriteLine(); test = "aaaa"; Console.WriteLine(test); Console.WriteLine("================="); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("aa", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("aaa", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("aaaa", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "a")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "aa")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "aaa")); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("a", "aaaa")); Console.WriteLine(); test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaabbbb"; Console.WriteLine(test); Console.WriteLine("================="); Console.WriteLine(test.GetFirstStringBetweenStringsMinSpan("bbbb", "aaaa")); string result = string.Empty; Stopwatch sw = new Stopwatch(); sw.Start(); result = test.GetFirstStringBetweenStringsMinSpan("aaaaaaa", "bb"); sw.Stop(); Console.WriteLine(result); Console.WriteLine(" in " + sw.ElapsedTicks + " ticks."); } }
Any call to string.SubString() is a possible code smell. It would be better to return a ReadOnlySpan (just replace SubString with AsSpan) and let the caller decide if they want to allocate a new string on the heap or not.
ReplyDeletehttps://docs.microsoft.com/en-us/dotnet/api/system.memoryextensions.asspan
DeleteAvailable in 5.0 Preview 8.NET Core3.1 3.0 2.2 2.1
https://adamsitnik.com/Span/ - indepth
Deletehttps://www.codeproject.com/Articles/5269747/Using-Span-T-to-improve-performance-of-Csharp-code
https://stackoverflow.com/questions/59596364/obtain-a-spanbyte-over-a-struct-without-making-a-copy-of-the-struct
Thank you Casual Cat!
Delete