Thursday, December 12, 2024

Get the number of columns and rows for a HTML Table using Regex



Here's an 'adhoc' quick way to just get dimensions of an HTML Table. In this post, 
I'll focus on getting the 
number of rows and columns for an HTML Table.

You can use this algorithm for any language, just use the Regex expressions below to enumerate rows, then count columns (another Regex) within each row.

I wanted to do this to avoid using bulky and unreliable third-party libraries. So, although not advised I used Regular Expressions to do this. Getting the number of rows in an HTML table is simple enough, but for columns, it can be tricky because of jagged tables (tables using colspan=). The other motivation is that this topic was sparsely covered. 

Normally in C# the way to solve this would be to load the document using HTML Agility Pack library, or more modern Angel Sharp library are pretty heavy downloads and requires a detailed security review. Who has time for that! 

So, here's the C# regex solution getting HTML Table size live code. 



So, here's the C# regex solution getting HTML Table size.

using System;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Program
{
	public static void Main()
	{
		string html = @"
<table>
    <caption>Employee Information for https://metadataconsulting.blogspot.com/</caption>
    <thead>
        <tr>
            <th>Name</th>
            <th>Position</th>
            <th>Office</th>
            <th>Age</th>
            <th>Start Date</th>
            <th>Salary</th>
        </tr>
    </thead>
    <tbody>
        <tr class='buttler'>
            <td>John Doe</td>
            <td>Software Engineer</td>
            <td>New York</td>
            <td>30</td>
            <td>2015-01-15</td>
            <td>$120,000</td>
        </tr>
        <tr>
            <td>Jane Smith</td>
            <td>Project Manager</td>
            <td>London</td>
            <td>40</td>
            <td>2010-03-25</td>
            <td>$150,000</td>
        </tr>
        <tr>
            <td>Emily Johnson</td>
            <td>Designer</td>
            <td>San Francisco</td>
            <td>28</td>
            <td>2018-07-12</td>
            <td>$100,000</td>
        </tr>
    </tbody>
    <tfoot>
        <tr>
            <td colspan=6>End of Employee Information</td>
        </tr>
    </tfoot>
</table>"; 

		string patternRow = @"(<tr.*?>)((?!</tr>).)+?(.*?)(</tr>)";
    	RegexOptions optionsRow = RegexOptions.Singleline;
    	Regex regxHTMLRow = new Regex(patternRow, optionsRow);
		
		int colcnt = 0;
		int cellcnt = 0; 
		double avgcolcnt = 0;
        int finalcolcnt = 0;  

		List<int> avgColCnt = new List<int>(); 
        Regex regxColCount = new Regex(@"<t[hd].*>(.*?)</t[hd]>", RegexOptions.Multiline);
        
		MatchCollection mcoll = regxHTMLRow.Matches(html);
		int m = 0;
        int r = 0; 
        if (mcoll.Count > 0)
        {
            foreach (Match match in mcoll)
            {
                m++;
                
                Console.WriteLine("Match #{0}, Match Row ='{1}'", m, match.Groups[3].Value);
                
       			MatchCollection mcollColCnt = regxColCount.Matches(match.Groups[3].Value);
				
				if (mcoll.Count > 0)
				{
					foreach (Match mc in mcollColCnt)
					{
						if (mc.Success) 
						{
							r++; 
							Console.WriteLine("\r\n\t\t\tMatch #{0}, Match Value ='{1}' ", r, mc.Value);
							cellcnt++;
							
						}
					}
					avgColCnt.Add(cellcnt); 
					colcnt = Math.Max(colcnt, cellcnt);
					cellcnt = 0; 
					r = 0; 
					Console.WriteLine();
				}
				
            }
        }
		Console.WriteLine("Max number of columns {0}", colcnt);	
		
		Console.WriteLine("Number of rows {0}", avgColCnt.Count());
 		
        Console.WriteLine("List: " + string.Join(", ", avgColCnt));
		
		//List<int> avgListReduced = avgColCnt.Where(x => x != avgColCnt.Min() && x != avgColCnt.Max()).ToList(); //removes all values that match max
		
		avgColCnt.Sort(); 
		
		List<int> avgListReduced = avgColCnt; 
	
		if (avgListReduced.Count > 2) {
			
			// Remove the first element
			avgListReduced.RemoveAt(0);

			// Remove the last element
			avgListReduced.RemoveAt(avgListReduced.Count - 1);
				
			Console.WriteLine("Number of avgListReduced {0}", avgListReduced.Count());
			// Print the updated list
			Console.WriteLine("Updated list: " + string.Join(", ", avgListReduced));
			
			avgcolcnt = avgListReduced.Average(); 

			try {		
			
				int intavgcolcnt = (int)Math.Round(avgcolcnt, MidpointRounding.AwayFromZero);

				Console.WriteLine("Average number of columns " + intavgcolcnt);
				
				//average effective after 3 or more rows?
				if (avgListReduced.Count >= 5 && intavgcolcnt != colcnt) 
					finalcolcnt =  intavgcolcnt;
				else
					finalcolcnt =  colcnt;	
			
			} 
			catch 
			{
				finalcolcnt =  colcnt;
			}
			

		}
		else {
				
				finalcolcnt =  colcnt;
		}
			
		Console.WriteLine("Final number of columns {0}",finalcolcnt );
				
		
	}
}

Thursday, December 5, 2024

Get the number of columns in an HTML Table using Regex



Here's a provisional 'adhoc' way to just get dimensions of an HTML Table quickly. In this post I focus on getting the harder of the two dimensions, and that is getting the number of columns in an HTML Table. I wanted to do this to avoid using bulky and unreliable third-party libraries. So, although not advised I used Regular Expressions to do this. Getting the number of rows in an HTML table is simple enough, but for columns, this topic was sparsely covered. 

Normally in C# the way to solve this would be to load the document using HTML Agility Pack library, or more modern Angel Sharp library are pretty heavy downloads and requires a detailed security review. Who has time for that! 

So, here's the regex solution to counting HTML columns live code. 



So here's the regex solution to counting HTML columns. 




using System;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Program
{
	public static void Main()
	{
		string html = @"
<table>
    <caption>Employee Information for https://metadataconsulting.blogspot.com/</caption>
    <thead>
        <tr>
            <th>Name</th>
            <th>Position</th>
            <th>Office</th>
            <th>Age</th>
            <th>Start Date</th>
            <th>Salary</th>
        </tr>
    </thead>
    <tbody>
        <tr class='buttler'>
            <td>John Doe</td>
            <td>Software Engineer</td>
            <td>New York</td>
            <td>30</td>
            <td>2015-01-15</td>
            <td>$120,000</td>
        </tr>
        <tr>
            <td>Jane Smith</td>
            <td>Project Manager</td>
            <td>London</td>
            <td>40</td>
            <td>2010-03-25</td>
            <td>$150,000</td>
        </tr>
        <tr>
            <td>Emily Johnson</td>
            <td>Designer</td>
            <td>San Francisco</td>
            <td>28</td>
            <td>2018-07-12</td>
            <td>$100,000</td>
        </tr>
    </tbody>
    <tfoot>
        <tr>
            <td colspan=6>End of Employee Information</td>
        </tr>
    </tfoot>
</table>"; 

		string patternRow = @"(<tr.*?>)((?!</tr>).)+?(.*?)(</tr>)";
    	RegexOptions optionsRow = RegexOptions.Singleline;
    	Regex regxHTMLRow = new Regex(patternRow, optionsRow);
		
		int colcnt = 0;
		int cellcnt = 0; 
		double avgcolcnt = 0;
        int finalcolcnt = 0;  

		List<int> avgColCnt = new List<int>(); 
        Regex regxColCount = new Regex(@"<t[hd].*>(.*?)</t[hd]>", RegexOptions.Multiline);
        
		MatchCollection mcoll = regxHTMLRow.Matches(html);
		int m = 0;
        int r = 0; 
        if (mcoll.Count > 0)
        {
            foreach (Match match in mcoll)
            {
                m++;
                
                Console.WriteLine("Match #{0}, Match Row ='{1}'", m, match.Groups[3].Value);
                
       			MatchCollection mcollColCnt = regxColCount.Matches(match.Groups[3].Value);
				
				if (mcoll.Count > 0)
				{
					foreach (Match mc in mcollColCnt)
					{
						if (mc.Success) 
						{
							r++; 
							Console.WriteLine("\r\n\t\t\tMatch #{0}, Match Value ='{1}' ", r, mc.Value);
							cellcnt++;
							
						}
					}
					avgColCnt.Add(cellcnt); 
					colcnt = Math.Max(colcnt, cellcnt);
					cellcnt = 0; 
					r = 0; 
					Console.WriteLine();
				}
				
            }
        }
		Console.WriteLine("Max number of columns {0}", colcnt);	
		
		Console.WriteLine("Number of rows {0}", avgColCnt.Count());
 		
        Console.WriteLine("List: " + string.Join(", ", avgColCnt));
		
		//List<int> avgListReduced = avgColCnt.Where(x => x != avgColCnt.Min() && x != avgColCnt.Max()).ToList(); //removes all values that match max
		
		avgColCnt.Sort(); 
		
		List<int> avgListReduced = avgColCnt; 
	
		if (avgListReduced.Count > 2) {
			
			// Remove the first element
			avgListReduced.RemoveAt(0);

			// Remove the last element
			avgListReduced.RemoveAt(avgListReduced.Count - 1);
				
			Console.WriteLine("Number of avgListReduced {0}", avgListReduced.Count());
			// Print the updated list
			Console.WriteLine("Updated list: " + string.Join(", ", avgListReduced));
			
			avgcolcnt = avgListReduced.Average(); 

			try {		
			
				int intavgcolcnt = (int)Math.Round(avgcolcnt, MidpointRounding.AwayFromZero);

				Console.WriteLine("Average number of columns " + intavgcolcnt);
				
				//average effective after 3 or more rows?
				if (avgListReduced.Count >= 5 && intavgcolcnt != colcnt) 
					finalcolcnt =  intavgcolcnt;
				else
					finalcolcnt =  colcnt;	
			
			} 
			catch 
			{
				finalcolcnt =  colcnt;
			}
			

		}
		else {
				
				finalcolcnt =  colcnt;
		}
			
		Console.WriteLine("Final number of columns {0}",finalcolcnt );
				
		
	}
}

Saturday, October 26, 2024

OneDrive Direct Download Link Generator for new 2024 API cracked - Working






Microsoft OneDrive Direct File Download URL Maker for old links




<iframe src="https://onedrive.live.com/embed?cid=8F99649728BEB2F3&resid=8F99649728BEB2F3%211010&authkey=AFo8ZQ_-qj84DEQ" width="98" height="120" frameborder="0" scrolling="no"></iframe>  see instructions below, if you don't know how. NOTE : For images there an extra step below.



https://onedrive.live.com/embed?cid=8F99649728BEB2F3&resid=8F99649728BEB2F3%211010&authkey=AFo8ZQ_-qj84DEQ and populate the text-box below. To proceed click "Get Download Link" button.


4. Now, Click on Get Download Link button.


Text-box will be highlighted so you can copy the full link. To copy, right-click and choose  "Copy" or simply (ctrl-c) to copy link. Note: Entire URL is automatic selected for your convenience.





 


Microsoft OneDrive Direct File Download URL Maker for new 2024 API

A new OneDrive 2024 API is being rolled out across geographic regions around the world and this is how to deal with it.


1. Mouse over item in OneDrive, and click ... and choose Embed, to get embed code.



2. Copy the embed code and extract the link. You can use Step 3 above, or copy and paste into Notepad and extract manually. Keyboard shortcut to copy (CTRL-C). Keyboard shortcut to paste (CTRL-V).

This is what the link will look like
https://1drv.ms/u/s!AsRPggdGwZFcjVEiJ1LvYyO1qZm9?embed=1





3. Copy and paste this into a new tab in your preferred browser.

This is area is known as the address bar. 



4. This may open an intermediate page that looks like this image below.
If you see this page, you must click on the link below the zip folder. Specifically, the yellow highlighted under the link touchx64v.1...








5. Once you see the OneDrive preview page to download the file, copy the URL in the browser address bar. Keyboard shortcut to copy (CTRL-C). Make sure you selected the entire link. Keyboard shortcut to Select All (CTRL-A).

If you copy into Notepad this is how the newly generated link by new 2024 API looks like as generated by new OneDrive 2024 API. 
https://onedrive.live.com/?authkey=%21ACInUu9jI7Wpmb0&cid=5C91C14607824FC4&id=5C91C14607824FC4%211745&parId=5C91C14607824FC4%21609&o=OneUp



6. Paste (Ctrl-V) browser address bar link into New 3.1 step above (scroll to top of page), under title Microsoft OneDrive Direct File Download URL Maker for old links instructions and paste link there.

Then continue to step 4.  Now, Click on Get Download Link button.  above.

7. Done. Share your download link. 


Another 1st by metadataconsulting.ca 😺 💫


Alternates, this did not work

“You can also create a public download link for a file in OneDrive by:

  1. Selecting the file or image and clicking "Share" on the top
  2. In the link settings, setting the link as "Anyone with the link" and "Allow editing"
  3. Clicking "Apply"
  4. Clicking "Copy"
  5. Adding the text "&download=1" to the end of the URL “

For Images, you get a direct url
  1. Right-click on image and select Embed. The Copy the URL to embed image now gives you direct download URL, but is super long.

    So you do not need to do the above!