Thursday, December 12, 2024

Get the number of columns and rows for a HTML Table using Regex



Here's an 'adhoc' quick way to just get dimensions of an HTML Table. In this post, 
I'll focus on getting the 
number of rows and columns for an HTML Table.

You can use this algorithm for any language, just use the Regex expressions below to enumerate rows, then count columns (another Regex) within each row.

I wanted to do this to avoid using bulky and unreliable third-party libraries. So, although not advised I used Regular Expressions to do this. Getting the number of rows in an HTML table is simple enough, but for columns, it can be tricky because of jagged tables (tables using colspan=). The other motivation is that this topic was sparsely covered. 

Normally in C# the way to solve this would be to load the document using HTML Agility Pack library, or more modern Angel Sharp library are pretty heavy downloads and requires a detailed security review. Who has time for that! 

So, here's the C# regex solution getting HTML Table size live code. 



So, here's the C# regex solution getting HTML Table size.

using System;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Program
{
	public static void Main()
	{
		string html = @"
<table>
    <caption>Employee Information for https://metadataconsulting.blogspot.com/</caption>
    <thead>
        <tr>
            <th>Name</th>
            <th>Position</th>
            <th>Office</th>
            <th>Age</th>
            <th>Start Date</th>
            <th>Salary</th>
        </tr>
    </thead>
    <tbody>
        <tr class='buttler'>
            <td>John Doe</td>
            <td>Software Engineer</td>
            <td>New York</td>
            <td>30</td>
            <td>2015-01-15</td>
            <td>$120,000</td>
        </tr>
        <tr>
            <td>Jane Smith</td>
            <td>Project Manager</td>
            <td>London</td>
            <td>40</td>
            <td>2010-03-25</td>
            <td>$150,000</td>
        </tr>
        <tr>
            <td>Emily Johnson</td>
            <td>Designer</td>
            <td>San Francisco</td>
            <td>28</td>
            <td>2018-07-12</td>
            <td>$100,000</td>
        </tr>
    </tbody>
    <tfoot>
        <tr>
            <td colspan=6>End of Employee Information</td>
        </tr>
    </tfoot>
</table>"; 

		string patternRow = @"(<tr.*?>)((?!</tr>).)+?(.*?)(</tr>)";
    	RegexOptions optionsRow = RegexOptions.Singleline;
    	Regex regxHTMLRow = new Regex(patternRow, optionsRow);
		
		int colcnt = 0;
		int cellcnt = 0; 
		double avgcolcnt = 0;
        int finalcolcnt = 0;  

		List<int> avgColCnt = new List<int>(); 
        Regex regxColCount = new Regex(@"<t[hd].*>(.*?)</t[hd]>", RegexOptions.Multiline);
        
		MatchCollection mcoll = regxHTMLRow.Matches(html);
		int m = 0;
        int r = 0; 
        if (mcoll.Count > 0)
        {
            foreach (Match match in mcoll)
            {
                m++;
                
                Console.WriteLine("Match #{0}, Match Row ='{1}'", m, match.Groups[3].Value);
                
       			MatchCollection mcollColCnt = regxColCount.Matches(match.Groups[3].Value);
				
				if (mcoll.Count > 0)
				{
					foreach (Match mc in mcollColCnt)
					{
						if (mc.Success) 
						{
							r++; 
							Console.WriteLine("\r\n\t\t\tMatch #{0}, Match Value ='{1}' ", r, mc.Value);
							cellcnt++;
							
						}
					}
					avgColCnt.Add(cellcnt); 
					colcnt = Math.Max(colcnt, cellcnt);
					cellcnt = 0; 
					r = 0; 
					Console.WriteLine();
				}
				
            }
        }
		Console.WriteLine("Max number of columns {0}", colcnt);	
		
		Console.WriteLine("Number of rows {0}", avgColCnt.Count());
 		
        Console.WriteLine("List: " + string.Join(", ", avgColCnt));
		
		//List<int> avgListReduced = avgColCnt.Where(x => x != avgColCnt.Min() && x != avgColCnt.Max()).ToList(); //removes all values that match max
		
		avgColCnt.Sort(); 
		
		List<int> avgListReduced = avgColCnt; 
	
		if (avgListReduced.Count > 2) {
			
			// Remove the first element
			avgListReduced.RemoveAt(0);

			// Remove the last element
			avgListReduced.RemoveAt(avgListReduced.Count - 1);
				
			Console.WriteLine("Number of avgListReduced {0}", avgListReduced.Count());
			// Print the updated list
			Console.WriteLine("Updated list: " + string.Join(", ", avgListReduced));
			
			avgcolcnt = avgListReduced.Average(); 

			try {		
			
				int intavgcolcnt = (int)Math.Round(avgcolcnt, MidpointRounding.AwayFromZero);

				Console.WriteLine("Average number of columns " + intavgcolcnt);
				
				//average effective after 3 or more rows?
				if (avgListReduced.Count >= 5 && intavgcolcnt != colcnt) 
					finalcolcnt =  intavgcolcnt;
				else
					finalcolcnt =  colcnt;	
			
			} 
			catch 
			{
				finalcolcnt =  colcnt;
			}
			

		}
		else {
				
				finalcolcnt =  colcnt;
		}
			
		Console.WriteLine("Final number of columns {0}",finalcolcnt );
				
		
	}
}

Thursday, December 5, 2024

Get the number of columns in an HTML Table using Regex



Here's a provisional 'adhoc' way to just get dimensions of an HTML Table quickly. In this post I focus on getting the harder of the two dimensions, and that is getting the number of columns in an HTML Table. I wanted to do this to avoid using bulky and unreliable third-party libraries. So, although not advised I used Regular Expressions to do this. Getting the number of rows in an HTML table is simple enough, but for columns, this topic was sparsely covered. 

Normally in C# the way to solve this would be to load the document using HTML Agility Pack library, or more modern Angel Sharp library are pretty heavy downloads and requires a detailed security review. Who has time for that! 

So, here's the regex solution to counting HTML columns live code. 



So here's the regex solution to counting HTML columns. 




using System;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Program
{
	public static void Main()
	{
		string html = @"
<table>
    <caption>Employee Information for https://metadataconsulting.blogspot.com/</caption>
    <thead>
        <tr>
            <th>Name</th>
            <th>Position</th>
            <th>Office</th>
            <th>Age</th>
            <th>Start Date</th>
            <th>Salary</th>
        </tr>
    </thead>
    <tbody>
        <tr class='buttler'>
            <td>John Doe</td>
            <td>Software Engineer</td>
            <td>New York</td>
            <td>30</td>
            <td>2015-01-15</td>
            <td>$120,000</td>
        </tr>
        <tr>
            <td>Jane Smith</td>
            <td>Project Manager</td>
            <td>London</td>
            <td>40</td>
            <td>2010-03-25</td>
            <td>$150,000</td>
        </tr>
        <tr>
            <td>Emily Johnson</td>
            <td>Designer</td>
            <td>San Francisco</td>
            <td>28</td>
            <td>2018-07-12</td>
            <td>$100,000</td>
        </tr>
    </tbody>
    <tfoot>
        <tr>
            <td colspan=6>End of Employee Information</td>
        </tr>
    </tfoot>
</table>"; 

		string patternRow = @"(<tr.*?>)((?!</tr>).)+?(.*?)(</tr>)";
    	RegexOptions optionsRow = RegexOptions.Singleline;
    	Regex regxHTMLRow = new Regex(patternRow, optionsRow);
		
		int colcnt = 0;
		int cellcnt = 0; 
		double avgcolcnt = 0;
        int finalcolcnt = 0;  

		List<int> avgColCnt = new List<int>(); 
        Regex regxColCount = new Regex(@"<t[hd].*>(.*?)</t[hd]>", RegexOptions.Multiline);
        
		MatchCollection mcoll = regxHTMLRow.Matches(html);
		int m = 0;
        int r = 0; 
        if (mcoll.Count > 0)
        {
            foreach (Match match in mcoll)
            {
                m++;
                
                Console.WriteLine("Match #{0}, Match Row ='{1}'", m, match.Groups[3].Value);
                
       			MatchCollection mcollColCnt = regxColCount.Matches(match.Groups[3].Value);
				
				if (mcoll.Count > 0)
				{
					foreach (Match mc in mcollColCnt)
					{
						if (mc.Success) 
						{
							r++; 
							Console.WriteLine("\r\n\t\t\tMatch #{0}, Match Value ='{1}' ", r, mc.Value);
							cellcnt++;
							
						}
					}
					avgColCnt.Add(cellcnt); 
					colcnt = Math.Max(colcnt, cellcnt);
					cellcnt = 0; 
					r = 0; 
					Console.WriteLine();
				}
				
            }
        }
		Console.WriteLine("Max number of columns {0}", colcnt);	
		
		Console.WriteLine("Number of rows {0}", avgColCnt.Count());
 		
        Console.WriteLine("List: " + string.Join(", ", avgColCnt));
		
		//List<int> avgListReduced = avgColCnt.Where(x => x != avgColCnt.Min() && x != avgColCnt.Max()).ToList(); //removes all values that match max
		
		avgColCnt.Sort(); 
		
		List<int> avgListReduced = avgColCnt; 
	
		if (avgListReduced.Count > 2) {
			
			// Remove the first element
			avgListReduced.RemoveAt(0);

			// Remove the last element
			avgListReduced.RemoveAt(avgListReduced.Count - 1);
				
			Console.WriteLine("Number of avgListReduced {0}", avgListReduced.Count());
			// Print the updated list
			Console.WriteLine("Updated list: " + string.Join(", ", avgListReduced));
			
			avgcolcnt = avgListReduced.Average(); 

			try {		
			
				int intavgcolcnt = (int)Math.Round(avgcolcnt, MidpointRounding.AwayFromZero);

				Console.WriteLine("Average number of columns " + intavgcolcnt);
				
				//average effective after 3 or more rows?
				if (avgListReduced.Count >= 5 && intavgcolcnt != colcnt) 
					finalcolcnt =  intavgcolcnt;
				else
					finalcolcnt =  colcnt;	
			
			} 
			catch 
			{
				finalcolcnt =  colcnt;
			}
			

		}
		else {
				
				finalcolcnt =  colcnt;
		}
			
		Console.WriteLine("Final number of columns {0}",finalcolcnt );
				
		
	}
}

Saturday, October 26, 2024

OneDrive Direct Download Link Generator for new 2024 API cracked - Working






Microsoft OneDrive Direct File Download URL Maker for old links




<iframe src="https://onedrive.live.com/embed?cid=8F99649728BEB2F3&resid=8F99649728BEB2F3%211010&authkey=AFo8ZQ_-qj84DEQ" width="98" height="120" frameborder="0" scrolling="no"></iframe>  see instructions below, if you don't know how. NOTE : For images there an extra step below.



https://onedrive.live.com/embed?cid=8F99649728BEB2F3&resid=8F99649728BEB2F3%211010&authkey=AFo8ZQ_-qj84DEQ and populate the text-box below. To proceed click "Get Download Link" button.


4. Now, Click on Get Download Link button.


Text-box will be highlighted so you can copy the full link. To copy, right-click and choose  "Copy" or simply (ctrl-c) to copy link. Note: Entire URL is automatic selected for your convenience.





 


Microsoft OneDrive Direct File Download URL Maker for new 2024 API

A new OneDrive 2024 API is being rolled out across geographic regions around the world and this is how to deal with it.


1. Mouse over item in OneDrive, and click ... and choose Embed, to get embed code.



2. Copy the embed code and extract the link. You can use Step 3 above, or copy and paste into Notepad and extract manually. Keyboard shortcut to copy (CTRL-C). Keyboard shortcut to paste (CTRL-V).

This is what the link will look like
https://1drv.ms/u/s!AsRPggdGwZFcjVEiJ1LvYyO1qZm9?embed=1





3. Copy and paste this into a new tab in your preferred browser.

This is area is known as the address bar. 



4. This may open an intermediate page that looks like this image below.
If you see this page, you must click on the link below the zip folder. Specifically, the yellow highlighted under the link touchx64v.1...








5. Once you see the OneDrive preview page to download the file, copy the URL in the browser address bar. Keyboard shortcut to copy (CTRL-C). Make sure you selected the entire link. Keyboard shortcut to Select All (CTRL-A).

If you copy into Notepad this is how the newly generated link by new 2024 API looks like as generated by new OneDrive 2024 API. 
https://onedrive.live.com/?authkey=%21ACInUu9jI7Wpmb0&cid=5C91C14607824FC4&id=5C91C14607824FC4%211745&parId=5C91C14607824FC4%21609&o=OneUp



6. Paste (Ctrl-V) browser address bar link into New 3.1 step above (scroll to top of page), under title Microsoft OneDrive Direct File Download URL Maker for old links instructions and paste link there.

Then continue to step 4.  Now, Click on Get Download Link button.  above.

7. Done. Share your download link. 


Another 1st by metadataconsulting.ca 😺 💫


Alternates, this did not work

“You can also create a public download link for a file in OneDrive by:

  1. Selecting the file or image and clicking "Share" on the top
  2. In the link settings, setting the link as "Anyone with the link" and "Allow editing"
  3. Clicking "Apply"
  4. Clicking "Copy"
  5. Adding the text "&download=1" to the end of the URL “

For Images, you get a direct url
  1. Right-click on image and select Embed. The Copy the URL to embed image now gives you direct download URL, but is super long.

    So you do not need to do the above! 


Monday, October 7, 2024

Setting up Notepad++ to use with Python language indentations code-style

From Python's official style guide (PEP 8 – Style Guide for Python Code) one should, 

        use 4 spaces per indentation level. 

To set-up Python language indents in Notepad++,

  • Choose Settings->Preferences->Indent Settings for python


To execute a python script, see 


Thursday, August 15, 2024

How does 'Convert to ANSI' and ANSI Encoding function work in Notepad++

What is the difference between function 'Convert to ANSI' and selecting ANSI character set for a file? 






















Let's start with an example; á’‹ Inuktitut symbol for 'gi'. The hexacimal representation of á’‹ is e1 92 8b opened in FRHED a free hex editor.













The 'Convert to ANSI' actually reduces the character set to fit within the ANSI range of 0-256 characters. ANSI encompasses on the upper end the Latin-1 Supplement plane to handle some established European languages. 

So, characters greater than decimal 256 or 0x3f in hexadecimal are converted to a question mark (?).

When you 'Convet to ANSI' on gi, it became a question mark 0x3f in hex. In Notepad++ you see a question mark as well. 







Select Encoding -> ANSI, does not change the values, just how they are displayed in Notepad++. 



















Sunday, August 11, 2024

How to open a URL with emacs from the Windows command line

Here's how start emacs with a URL from the Windows command line so you can surf a website, using emacs built-in and very fast and stable text browser. EWW stands for GNU Emacs Web Wowser.

cd C:\Program Files\Emacs\emacs-28.2\bin
emacs --eval "(eww \"https://google.com\")"

emacs -f eww-browse https://google.com

Result





























This will open emacs, but the command window will still wait until the emacs client in closed. 

To get around this and this long command line, use my Clipboard Plaintext PowerTool to copy URL and launch Emacs text browser in 1-click! :) 





Tuesday, August 6, 2024

What is the difference between Convert to ANSI function versus ANSI Encoding in Notepad++

What is the difference between function 'Convert to ANSI' and selecting ANSI character set for a file? 






















Let's start with an example; á’‹ Inuktitut symbol for 'gi'. The hexacimal representation of á’‹ is e1 92 8b opened in FRHED a free hex editor.













Convert to ANSI Function


The 'Convert to ANSI' actually reduces the character set to fit within the ANSI range of 0-256 characters. ANSI encompasses on the upper end the Latin-1 Supplement plane to handle some established European languages. 

So, characters greater than decimal 256 or 0x3f in hexadecimal are converted to a question mark (?).

When you 'Convet to ANSI' on gi, it became a question mark 0x3f in hex. In Notepad++ you see a question mark as well. 





Selecting ANSI Encoding


Select Encoding -> ANSI, does not change the values, just how they are displayed in Notepad++. 
















That's the difference! 


Tuesday, July 23, 2024

How to create an inline Base64 image tag using Notepad++

This how to create an inline Base64 image tag using Notepad++, so can paste the following code into an email or use in your HTML SPA code.

  <img src="data:image/jpeg;base64,xxxxxxx" alt="inline of image" />

Here are the steps using Notepad++

1. Open your image in Notepad++, in the example image below, it's called clippy.jpg. 

2. Press CTRL-A to select all the text, this is very important to select all the text.

3. Choose Base64 Encode from Plugins menu to encode selected text to Base64. 



















4. Add data:image/jpeg;base64, to the beginning if the text. The mime type must match image extension, for example for .PNG image file the mime type is image/png;base64 (appending ;base64 to indicate encoding). For .gif image/gif, See mime type list here, search for 'image/'.

5. Press CTRL-A and paste into your image src="PASTE HERE" tag. Make sure you have the double quotes.










5. OR Save this file as "clippy.jpg.b64" for future reference. 

Wednesday, June 19, 2024

How to Disable PWAs in Microsoft Edge/Chrome browser

How to prevent Progressive Web Apps (PWAs) from installing on your desktop. 

A PWA is a web-based app created using HTML, CSS, and JavaScript that can be installed from a website like a regular desktop application. Once installed, the operating system will create a PWA shortcut and add it to Add or Remove Programs in Windows and under the /Users/<account>/Applications/ folder in macOS.

Recently these have been used to steal credentials and are unsafe. 
New phishing toolkit uses PWAs to steal login credentials (bleepingcomputer.com)

In Edge, type in address bar about://flags, search for 'handling' and change the settings to match image below. All settings that required changes are now all changed to Disabled. 


about://flags, search for 'PWA'
and change the settings to match image below.  All settings that are required to change are now all changed to Disabled. 



































about://settings, search for 'application'

Change to off position as in image below.

about://settings, search for 'protocol'

Change to off position as in image below.


about://settings, search for 'apps'

Turn on Block Potentially Unwanted Apps.

Microsoft Defender SmartScreen must be enabled to turn on Block Potentially Unwanted Apps.
But note MDSS, talks to Microsoft servers, and is a privacy leak. 





Service workers are used by PWAs. To list and delete current service workers type in address bar
chrome://inspect/#service-workers


To disable you need an add-on unfortunately.

Sources: 

PWAs as URL Handlers  |  Capabilities  |  Chrome for Developers

Handle protocols in Progressive Web Apps - Microsoft Edge Developer documentation | Microsoft Learn

Test PWA - 

https://www.bmw.ca/en/topics/offers-and-services/bmw-digital-services-and-connectivity/my-bmw-app.html