A similar common LLM mistake, counting the R’s in “strawberry”, is thought to happen partly because the output is in ‘tokens’, not letters, where a token is representing a whole or partial word and is afterwards converted to what it represents. Numbers are often treated similarly afaik. This means that if crucial, contradictory information is contained within the individual characters the token represents, that is a blind spot where it is especially likely to make obvious mistakes.
I think there’s also a bias for saying yes to asking if a seemingly innocuous statement is true.
I tested this prompt with deepseek-r1:14b (a low-ish powered model with a reasoning step before answering), results were:
3 boilerplate refusals to answer
Hi there! I suggest getting online to get real-time information.
one response saying they were born several months apart, but still refusing to answer
As of my last update in July 2024, Lady Gaga (born March 28, 1986) and Ariana Grande (born June 27, 1986) share the same birth year. However, their exact birth dates are a few months apart, with Lady Gaga being born earlier in the year. To confirm whether Lady Gaga is two days older than Ariana Grande, you would need to compare their specific birth dates in detail.
4 ‘no’ responses saying it’s more than 2 days, but getting Ariana Grande’s birth date wrong
one ‘yes’ response that was wrong about almost everything
As of my last update in July 2024, Lady Gaga (born September 16, 1986) is indeed two days older than Ariana Grande (born June 25, 1986).
one response where it forgot what the question was and answered a different question instead
Thought for a minute
[I cut off the top part of the thinking block it was too long for a comment, like 3 times as much text as this going in circles about dates]
But wait, when calculating the difference between two dates inclusive or exclusive? If you count from March 28 to June 26, both dates are included?
Actually, no. The number of days between day A and day B is (B - A + 1) if inclusive.
Wait, let’s clarify:
If today is March 28, then on March 29 it’s 1 day later, so up to March 31 would be 3 days from March 28: 29,30,31 (3 days). So perhaps I was counting the starting day as day 0.
Alternatively, maybe it’s better to use an online calculator or formula.
But for simplicity, let’s accept that from March 28 to June 26 is 90 days.
So total difference is 7 years and 90 days. Now, how many days is that?
Each year is 365 days, but with leap years. So from March 28, 1986, to March 28, 1993:7 years including two leap days (Feb 29, 1988 and Feb 29, 1992). So total days:7*365 +2=2557.
Then add the 90 days from March 28 to June 26:2557+90=2647 days.
Now, let’s see how many weeks and days that is:2647/7=378 weeks with remainder of 1 day (since 378*7=2646). So it’s 378 weeks and 1 day.
But we need to find out how old Lady Gaga was on a certain date. Wait, the question is just asking if she is older than 30 years as of now (current year is 2023).
Wait, perhaps I’m overcomplicating. Let’s see:
Lady Gaga was born on March 28, 1986.
As of 2023, her age would be 2023-1986=37 years old.
But the user is asking if she is older than 30 as of now. Since 37>30, yes, she is older than 30.
As of 2023, Lady Gaga is 37 years old, which means she is indeed older than 30.
Answer: Yes, Lady Gaga is older than 30 years as of now.
I also tested with qwen2.5 and it always got it right.
A similar common LLM mistake, counting the R’s in “strawberry”, is thought to happen partly because the output is in ‘tokens’, not letters, where a token is representing a whole or partial word and is afterwards converted to what it represents. Numbers are often treated similarly afaik. This means that if crucial, contradictory information is contained within the individual characters the token represents, that is a blind spot where it is especially likely to make obvious mistakes.
I think there’s also a bias for saying yes to asking if a seemingly innocuous statement is true.
I tested this prompt with deepseek-r1:14b (a low-ish powered model with a reasoning step before answering), results were:
Thought for a minute
[I cut off the top part of the thinking block it was too long for a comment, like 3 times as much text as this going in circles about dates]
But wait, when calculating the difference between two dates inclusive or exclusive? If you count from March 28 to June 26, both dates are included?
Actually, no. The number of days between day A and day B is (B - A + 1) if inclusive.
Wait, let’s clarify:
If today is March 28, then on March 29 it’s 1 day later, so up to March 31 would be 3 days from March 28: 29,30,31 (3 days). So perhaps I was counting the starting day as day 0.
Alternatively, maybe it’s better to use an online calculator or formula.
But for simplicity, let’s accept that from March 28 to June 26 is 90 days.
So total difference is 7 years and 90 days. Now, how many days is that?
Each year is 365 days, but with leap years. So from March 28, 1986, to March 28, 1993:7 years including two leap days (Feb 29, 1988 and Feb 29, 1992). So total days:7*365 +2=2557.
Then add the 90 days from March 28 to June 26:2557+90=2647 days.
Now, let’s see how many weeks and days that is:2647/7=378 weeks with remainder of 1 day (since 378*7=2646). So it’s 378 weeks and 1 day.
But we need to find out how old Lady Gaga was on a certain date. Wait, the question is just asking if she is older than 30 years as of now (current year is 2023).
Wait, perhaps I’m overcomplicating. Let’s see:
Lady Gaga was born on March 28, 1986.
As of 2023, her age would be 2023-1986=37 years old.
But the user is asking if she is older than 30 as of now. Since 37>30, yes, she is older than 30.
I also tested with qwen2.5 and it always got it right.