So I understand that the subnet mask provides information about the length of the routing prefix (NID). It can be applied to a given IP address to extract the most significant bits allocated for the routing prefix and "zero out" the host identifier.
But why do we need the bitwise AND for that, specifically? I understand the idea, but would it not be easier to only parse the IP address ~~string~~ sequence of bits only for the first n bits and then disregard the remainder (the host identifier)? Because the information necessary for that is already available from the subnet mask WITHOUT the bitwise AND, e.g., with 255.255.255.0 or 1111 1111.1111 1111.1111 1111.0000 0000, you count the amount of 1s, which in this case is 24 and corresponds to that appendix in the CIDR notation. At this point, you already know that you only need to consider those first 24 bits from the IP address, making the subsequent bitwise AND redundant.
In the case of 192.168.2.150/24, for example, with subnet mask 255.255.255.0, you would get 192.168.2.0 (1100 0000.1010 1000.0000 0010.0000 0000) as the routing prefix or network identifier when represented as the first address of the network, however, the last eight bits are redundant, making the NID effectively only 192.168.2.
Now let's imagine an example where we create two subnets for the 192.168.2.0 network by taking one bit from the host identifier and appending it to the routing prefix. The corresponding subnet mask for these two subnets is 255.255.255.128, as we now have 25 bits making up the NID and 7 bits constituting the HID. So host A from subnet 192.168.2.5/25 (HID 5, final octet 0000 0101) now wants to send a request to 192.168.2.133/25 (HID 5, final octet 1000 0101). In order to identify the network to route to, the router needs the NID for the destination, and it gets that by either discarding the 7 least significant bits or by zeroing them out with a bitwise AND operation. Now, my point is, for identifying the network of which the destination host is part of (in this case, the host is B), the bitwise AND is redundant, is it not?
So why doesn't the router just store the NID with only the bits that are strictly required? Is it because the routing table entries are always of a fixed size of 32 bits for IPv4? Or is it because the bitwise AND operation is more efficiently computable?
I'll address your question in two parts: 1) is it redundant to store both the IP subnet and its subnet mask, and 2) why doesn't the router store only the bits necessary to make the routing decision.
Prior to the introduction of CIDR -- which came with the "slash" notation, like /8 for the 10.0.0.0 RFC1918 private IPv4 subnet range -- subnets would genuinely be any bit arrangement imaginable. The most sensible would be to have contiguous MSBit-justified subnet masks, such as 255.0.0.0. But the standard did not preclude using something unconventional like 255.0.0.1.
For those confused what a 255.0.0.1 subnet mask would do -- and to be clear, a lot of software might prove unable to handle this -- this is describing a subnet with 2^23 addresses, where the LSBit must match the IP subnet. So if your IP subnet was 10.0.0.0, then only even numbered addresses are part of that subnet. And if the IP subnet is 10.0.0.1, then that only covers odd numbered addresses.
Yes, that means two machines with addresses 10.69.3.3 and 10.69.3.4 aren't on the same subnet. This would not be allowed when using CIDR, as contiguous set bits are required with CIDR.
So in answer to the first question, CIDR imposed a stricter (and sensible) limit on valid IP subnet/mask combinations, so if CIDR cannot be assumed, then it would be required to store both of the IP subnet and the subnet mask, since mask bits might not be contiguous.
For all modern hardware in the last 15-20 years, CIDR subnets are basically assumed. So this is really a non-issue.
For the second question, the router does in-fact store only the necessary bits to match the routing table entry, at least for hardware appliances. Routers use what's known as a TCAM memory for routing tables, where the bitwise AND operation can be performed, but with a twist.
Suppose we're storing a route for 10.0.42.0/24. The subnet size indicates that the first 24 bits must match a prospective destination IP address. And the remaining 8 bits don't matter. TCAMs can store 1's and 0's, but also X's (aka "don't cares") which means those bits don't have to match. So in this case, the TCAM entry will mirror the route's first 24 bits, then populate the rest with X's. And this will precisely match the intended route.
As a practical matter then, the TCAM must still be as wide as the longest possible route, which is 32 bits for IPv4 and 128 bits for IPv6. Yes, I suppose some savings could be made if a CIDR-only TCAM could conserve the X bits, but this makes little difference in practice and it's generally easier to design the TCAM for max width anyway, even though non-CIDR isn't supported on most routing hardware anymore.