History LLMs: Models trained exclusively on pre-1913 texts
Posted by iamwil 2 days ago
Comments
Comment by saaaaaam 2 days ago
“Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu.”
This is really fascinating. As someone who reads a lot of history and historical fiction I think this is really intriguing. Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.
Comment by jscyc 2 days ago
Comment by srtw 1 day ago
Comment by EvanAnderson 16 hours ago
Without saying anything specific to spoil plot poonts, I will say that I ended-up having a kidney stone while I was reading the last two books of the series. It was fucking eerie.
Comment by bikeshaving 1 day ago
Comment by ghurtado 1 day ago
If I started a list with the things that were comically sci Fi when I was a kid, and are a reality today, I'd be here until next Tuesday.
Comment by nottorp 1 day ago
As an example, portable phones have been predicted. Portable smartphones that are more like chat and payment terminals with a voice function no one uses any more ... not so much.
Comment by burkaman 1 day ago
It's the most prescient thing I've ever read, and it's pretty short and a genuinely good story, I recommend everyone read it.
Edit: Just skimmed it again and realized there's an LLM-like prediction as well. Access to the Earth's surface is banned and some people complain, until "even the lecturers acquiesced when they found that a lecture on the sea was none the less stimulating when compiled out of other lectures that had already been delivered on the same subject."
Comment by morpheos137 23 hours ago
-people a depicted as grey aliens (no teeth, large eyes, no hair). Lesson the Greys are a future version of us.
The air is poisoned and ruined cities. People live in underground bunkers...1909...nuclear war was unimaginable then. This was still the age of steam ships and coal power trains. Even respirators would have been low on the public imagination.
The air ships with metal blinds sound more like UFOs than blimps.
The white worms.
People are the blood cells of the machine which runs on their thoughts social media data harvesting of ai.
China invaded Australia. This story was 8 years or so after the Boxer Rebellion so that would have sounded like say Iraq invading the USA in the context of its time.
The story suggests this is a cyclical process of a bifurcated human race.
The blimp crashing into the steel evokes 9/11, 91+1 years later...
The constellation orion.
Etc etc.
There is a central commitee
Comment by anthk 16 hours ago
That's just the Victorian London.
Comment by madaxe_again 17 hours ago
It’s interesting - Forster wrote like the Huxley of his day, Zamyatin like the Orwell - but both felt they were carrying Wells’ baton - and they were, just from differing perspectives.
Comment by dmd 1 day ago
Comment by 6510 1 day ago
Comment by marci 1 day ago
Comment by ajuc 1 day ago
Comment by nottorp 1 day ago
I mean, all Kindle does for me is save me space. I don't have to store all those books now.
Who predicted the humble internet forum though? Or usenet before it?
Comment by arcade79 17 hours ago
The Shockwave Rider was also remarkable prescient.
Comment by ghaff 1 day ago
They're convenient but if they went away tomorrow, my life wouldn't really change in any material way. That's not really the case with smartphones much less the internet more broadly.
Comment by nottorp 1 day ago
Funny, I had "The collected stories of Frank Herbert" as my next read on my tablet. Here's a juicy quote from like the third screen of the first story:
"The bedside newstape offered a long selection of stories [...]. He punched code letters for eight items, flipped the machine to audio and listened to the news while dressing."
Anything qualitative there? Or all of it quantitative?
Story is "Operation Syndrome", first published in 1954.
Hey, where are our glowglobes and chairdogs btw?
Comment by lloeki 1 day ago
I'd take smartphones vanishing rather than books any day.
Comment by ghaff 1 day ago
Comment by lloeki 1 day ago
Comment by ghaff 1 day ago
Comment by nottorp 1 day ago
I didn't believe you meant that of course, but we've already seen it can happen.
Comment by KingMob 1 day ago
Comment by varjag 1 day ago
Comment by BiteCode_dev 1 day ago
Still can't believe people buy their stock, given that they are the closest thing to a James Bond villain, just because it goes up.
I mean, they are literally called "the stuff Sauron uses to control his evil forces". It's so on the nose it reads like an anime plot.
Comment by notarobot123 1 day ago
Comment by psychoslave 1 day ago
Future is inevitable, but only ignorants of self predictive ability are thinking that what's going to populate future is inevitable.
Comment by monocasa 1 day ago
So "panopticon that if it had been used properly, would have prevented the destruction of two towers" while ignoring the obvious "are we the baddies?"
Comment by duskdozer 1 day ago
But yeah lots of people don't really buy into the idea of their small contribution to a large problem being a problem.
Comment by Lerc 1 day ago
As an abstract idea I think there is a reasonable argument to be made that the size of any contribution to a problem should be measured as a relative proportion of total influence.
The carbon footprint is a good example, if each individual focuses on reducing their small individual contribution then they could neglect systemic changes that would reduce everyone's contribution to a greater extent.
Any scientist working on a method to remove a problem shouldn't abstain from contributing to the problem while they work.
Or to put it as a catchy phrase. Someone working on a cleaner light source shouldn't have to work in the dark.
Comment by duskdozer 1 day ago
Right, I think you have responsibility for your 1/<global population>th (arguably considerably more though, for first-worlders) of the problem. What I see is something like refusal to consider swapping out a two-stroke-engine-powered tungsten lightbulb with an LED of equivalent brightness, CRI, and color temperature, because it won't unilaterally solve the problem.
Comment by quesera 1 day ago
I proudly owned zero shares of Microsoft stock, in the 1980s and 1990s. :)
I own no Palantir today.
It's a Pyrrhic victory, but sometimes that's all you can do.
Comment by kbrkbr 1 day ago
Comment by ruszki 1 day ago
Comment by iwontberude 1 day ago
Comment by CamperBob2 1 day ago
Comment by iwontberude 2 hours ago
Comment by CamperBob2 1 day ago
I've been tempted to. "Everything will be terrible if these guys succeed, but at least I'll be rich. If they fail I'll lose money, but since that's the outcome I prefer anyway, the loss won't bother me."
Trouble is, that ship has arguably already sailed. No matter how rapidly things go to hell, it will take many years before PLTR is profitable enough to justify its half-trillion dollar market cap.
Comment by morkalork 1 day ago
Comment by UltraSane 1 day ago
Comment by dnel 1 day ago
Comment by bookofjoe 1 day ago
https://www.amazon.com/Man-Presidents-Mind-Ted-Allbeury/dp/0...
Comment by catlifeonmars 1 day ago
Comment by 9dev 1 day ago
Comment by idiotsecant 1 day ago
Comment by hn_go_brrrrr 1 day ago
Comment by sigwinch 1 day ago
I assume the CIA is lying about simulating world leaders. These are narcissistic personalities and it’s jarring to hear that they can be replaced, either by a body double or an indistinguishable chatbot. Also, it’s still cheaper to have humans do this.
More likely, the CIA is modeling its own experts. Not as useful a press release and not as impressive to the fractious executive branch. But consider having downtime as a CIA expert on submarine cables. You might be predicting what kind of available data is capable of predicting the cause and/or effect of cuts. Ten years ago, an ensemble of such models was state of the art, but its sensory libraries were based on maybe traceroute and marine shipping. With an LLM, you can generate a whole lot of training data that an expert can refine during his/her downtime. Maybe there’s a potent new data source that an expensive operation could unlock. That ensemble of ML models from ten years ago can still be refined.
And then there’s modeling things that don’t exist. Maybe it’s important to optimize a statement for its disinfo potency. Try it harmlessly on LLMs fed event data. What happens if some oligarch retires unexpectedly? Who rises? That kind of stuff.
To your last point, with this executive branch, I expect their very first question to CIA wasn’t about aliens or which nations have a copy of a particular tape of Trump, but can you make us money. So the approaches above all have some way of producing business intelligence. Whereas a Kim Jong Un bobblehead does not.
Comment by DonHopkins 1 day ago
Comment by UltraSane 1 day ago
Comment by hamasho 1 day ago
[1] AI learns one year's worth of CEO Sumitomo Mitsui Financial Group's president's statements [WBS] https://youtu.be/iG0eRF89dsk
Comment by htrp 1 day ago
I remember Reid Hoffman creating a digital avatar to pitch himself netflix
Comment by fragmede 1 day ago
Comment by entrox 1 day ago
Comment by RobotToaster 1 day ago
Comment by otabdeveloper4 1 day ago
Comment by NuclearPM 1 day ago
Comment by BoredPositron 1 day ago
Comment by DonHopkins 1 day ago
Now there is Fake ChatGPT.
Comment by ghurtado 1 day ago
Comment by A4ET8a8uTh0_v2 1 day ago
- Are you ( edit: on a ) paid version? - If paid, which model you used? - Can you share exact prompt?
I am genuinely asking for myself. I have never received an answer this direct, but I accept there is a level of variability.
Comment by abrookewood 1 day ago
Comment by culi 1 day ago
On that same note, there was this great YouTube series called The Great War. It spanned from 2014-2018 (100 years after WW1) and followed WW1 developments week by week.
Comment by tyre 1 day ago
Comment by verve_rat 1 day ago
They are currently in the middle of a Korean War version: https://youtube.com/@thekoreanwarbyindyneidell
Comment by pwillia7 1 day ago
Comment by takeda 1 day ago
Having the facts from the era is one thing, to make conclusions about things it doesn't know would require intelligence.
Comment by dr-detroit 1 day ago
Comment by ghurtado 1 day ago
Every "King Arthur travels to the year 2000" kinda script is now something that writes itself.
> Imagine having a conversation with someone genuinely from the period,
Imagine not just someone, but Aristotle or Leonardo or Kant!
Comment by RobotToaster 1 day ago
Comment by yorwba 1 day ago
Comment by anthk 14 hours ago
With Alphonse X, o The Cid, it would be greater issues, but understandable over weeks.
Comment by psychoslave 1 day ago
Isn't this part of the basics feature of human conditions? Not only we are all unaware of the coming historic outcome (though we can get some big points with more or less good guesses), but to a marginally variable extend, we are also very unaware of past and present history.
LLM are not aware, but they can be trained on larger historical accounts than any human and regurgitate syntactically correct summary on any point within it. Very different kind of utterer.
Comment by pwillia7 1 day ago
Comment by Sprotch 18 hours ago
Comment by observationist 2 days ago
Comment by nottorp 1 day ago
Comment by ilaksh 1 day ago
Comment by eek2121 1 day ago
LLMs are just seemingly intelligent autocomplete engines, and until they figure a way to stop the hallucinations, they aren't great either.
Every piece of code a developer churns out using LLMs will be built from previous code that other developers have written (including both strengths and weaknesses, btw). Every paragraph you ask it to write in a summary? Same. Every single other problem? Same. Ask it to generate a summary of a document? Don't trust it here either. [Note, expect cyber-attacks later on regarding this scenario, it is beginning to happen -- documents made intentionally obtuse to fool an LLM into hallucinating about the document, which leads to someone signing a contract, conning the person out of millions].
If you ask an LLM to solve something no human has, you'll get a fabrication, which has fooled quite a few folks and caused them to jeopardize their career (lawyers, etc) which is why I am posting this.
Comment by libraryofbabel 1 day ago
Sure, LLMs do not think like humans and they may not have human-level creativity. Sometimes they hallucinate. But they can absolutely solve new problems that aren’t in their training set, e.g. some rather difficult problems on the last Mathematical Olympiad. They don’t just regurgitate remixes of their training data. If you don’t believe this, you really need to spend more time with the latest SotA models like Opus 4.5 or Gemini 3.
Nontrivial emergent behavior is a thing. It will only get more impressive. That doesn’t make LLMs like humans (and we shouldn’t anthropomorphize them) but they are not “autocomplete on steroids” anymore either.
Comment by root_axis 1 day ago
This is just an appeal to complexity, not a rebuttal to the critique of likening an LLM to a human brain.
> they are not “autocomplete on steroids” anymore either.
Yes, they are. The steroids are just even more powerful. By refining training data quality, increasing parameter size, and increasing context length we can squeeze more utility out of LLMs than ever before, but ultimately, Opus 4.5 is the same thing as GPT2, it's only that coherence lasts a few pages rather than a few sentences.
Comment by int_19h 1 day ago
This tells me that you haven't really used Opus 4.5 at all.
Comment by baq 1 day ago
Second, to autocomplete the name of the killer in a detective book outside of the training set requires following and at least some understanding of the plot.
Comment by dash2 1 day ago
Comment by root_axis 1 day ago
Comment by hexaga 1 day ago
That is to say, they are equally likely if you don't do next token prediction at all and instead do text diffusion or something. Architecture has nothing to do with it. They arise because they are early partial solutions to the reconstruction task on 'all the text ever made'. Reconstruction task doesn't care much about truthiness until way late in the loss curve (where we probably will never reach), so hallucinations are almost as good for a very long time.
RL as is typical in post-training _does not share those early solutions_, and so does not share the fundamental problems. RL (in this context) has its own share of problems which are different, such as reward hacks like: reliance on meta signaling (# Why X is the correct solution, the honest answer ...), lying (commenting out tests), manipulation (You're absolutely right!), etc. Anything to make the human press the upvote button or make the test suite pass at any cost or whatever.
With that said, RL post-trained models _inherit_ the problems of non-optimal large corpora reconstruction solutions, but they don't introduce more or make them worse in a directed manner or anything like that. There's no reason to think them inevitable, and in principle you can cut away the garbage with the right RL target.
Thinking about architecture at all (autoregressive CE, RL, transformers, etc) is the wrong level of abstraction for understanding model behavior: instead, think about loss surfaces (large corpora reconstruction, human agreement, test suites passing, etc) and what solutions exist early and late in training for them.
Comment by libraryofbabel 1 day ago
I wasn’t arguing that LLMs are like a human brain. Of course they aren’t. I said twice in my original post that they aren’t like humans. But “like a human brain” and “autocomplete on steroids” aren’t the only two choices here.
As for appealing to complexity, well, let’s call it more like an appeal to humility in the face of complexity. My basic claim is this:
1) It is a trap to reason from model architecture alone to make claims about what LLMs can and can’t do.
2) The specific version of this in GP that I was objecting to was: LLMs are just transformers that do next token prediction, therefore they cannot solve novel problems and just regurgitate their training data. This is provably true or false, if we agree on a reasonable definition of novel problems.
The reason I believe this is that back in 2023 I (like many of us) used LLM architecture to argue that LLMs had all sorts of limitations around the kind of code they could write, the tasks they could do, the math problems they could solve. At the end of 2025, SotA LLMs have refuted most of these claims by being able to do the tasks I thought they’d never be able to do. That was a big surprise to a lot us in the industry. It still surprises me every day. The facts changed, and I changed my opinion.
So I would ask you: what kind of task do you think LLMs aren’t capable of doing, reasoning from their architecture?
I was also going to mention RL, as I think that is the key differentiator that makes the “knowledge” in the SotA LLMs right now qualitatively different from GPT2. But other posters already made that point.
This topic arouses strong reactions. I already had one poster (since apparently downvoted into oblivion) accuse me of “magical thinking” and “LLM-induced-psychosis”! And I thought I was just making the rather uncontroversial point that things may be more complicated than we all thought in 2023. For what it’s worth, I do believe LLMs probably have limitations (like they’re not going to lead to AGI and are never going to do mathematics like Terence Tao) and I also think we’re in a huge bubble and a lot of people are going to lose their shirts. But I think we all owe it to ourselves to take LLMs seriously as well. Saying “Opus 4.5 is the same thing as GPT2” isn’t really a pathway to do that, it’s just a convenient way to avoid grappling with the hard questions.
Comment by nl 23 hours ago
Comment by A4ET8a8uTh0_v2 1 day ago
Comment by root_axis 1 day ago
Comment by A4ET8a8uTh0_v2 1 day ago
Comment by root_axis 1 day ago
Comment by a1j9o94 1 day ago
And I know not everyone thinks in a literal stream of words all the time (I do) but I would argue that those people's brains are just using a different "token"
Comment by root_axis 1 day ago
Prior to LLMs, there was never any suggestion that thoughts work like autocomplete, but now people are working backwards from that conclusion based on metaphorical parallels.
Comment by LiKao 1 day ago
Predictive coding theory was formalized back around 2010 and traces it roots up to theories by Helmholtz from 1860.
Predictive coding theory postulates that our brains are just very strong prediction machines, with multiple layers of predictive machinery, each predicting the next.
Comment by red75prime 1 day ago
Roots of predictive coding theory extend back to 1860s.
Natalia Bekhtereva was writing about compact concept representations in the brain akin to tokens.
Comment by root_axis 1 day ago
Yes, you can draw interesting parallels between anything when you're motivated to do so. My point is that this isn't parsimonious reasoning, it's working backwards from a conclusion and searching for every opportunity to fit the available evidence into a narrative that supports it.
> Roots of predictive coding theory extend back to 1860s.
This is just another example of metaphorical parallels overstating meaningful connections. Just because next-token-prediction and predictive coding have the word "predict" in common doesn't mean the two are at all related in any practical sense.
Comment by A4ET8a8uTh0_v2 1 day ago
Fascinating framing. What would you consider evidence here?
Comment by 9dev 1 day ago
Comment by A4ET8a8uTh0_v2 1 day ago
Other posters already noted other reasons for it, but I will note that you are saying 'similar to autocomplete, but obviously' suggesting you recognize the shape and immediately dismissing it as not the same, because the shape you know in humans is much more evolved and co do more things. Ngl man, as arguments go, it sounds to me like supercharged autocomplete that was allowed to develop over a number of years.
Comment by 9dev 1 day ago
Or in other words, this thread sure attracts a lot of armchair experts.
Comment by quesera 1 day ago
... but we also need to be careful with that assertion, because humans do not understand cognition, psychology, or biology very well.
Biology is the furthest developed, but it turns out to be like physics -- superficially and usefully modelable, but fundamental mysteries remain. We have no idea how complete our models are, but they work pretty well in our standard context.
If computer engineering is downstream from physics, and cognition is downstream from biology ... well, I just don't know how certain we can be about much of anything.
> this thread sure attracts a lot of armchair experts.
"So we beat on, boats against the current, borne back ceaselessly into our priors..."
Comment by LiKao 1 day ago
However, what it is doing is layered autocomplete on itself. I.e. one part is trying to predict what the other part will be producing and training itself on this kind of prediction.
What emerges from this layered level of autocompletes is what we call thought.
Comment by NiloCK 1 day ago
Probably you believe that humans have something called intelligence, but the pressure that produced it - the likelihood of specific genetic material to replicate - it is much more tangential to intelligence than next-token-prediction.
I doubt many alien civilizations would look at us and say "not intelligent - they're just genetic information replication on steroids".
Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing".
Comment by root_axis 1 day ago
Invoking terms like "selection mechanism" is begging the question because it implicitly likens next-token-prediction training to natural selection, but in reality the two are so fundamentally different that the analogy only has metaphorical meaning. Even at a conceptual level, gradient descent gradually honing in on a known target is comically trivial compared to the blind filter of natural selection sorting out the chaos of chemical biology. It's like comparing legos to DNA.
> Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing".
RL is still token prediction, it's just a technique for adjusting the weights to align with predictions that you can't model a loss function for in per-training. When RL rewards good output, it's increasing the statistical strength of the model for an arbitrary purpose, but ultimately what is achieved is still a brute force quadratic lookup for every token in the context.
Comment by vachina 1 day ago
You still need to hand hold it all the way as it is only capable of regurgitating the tiny amount of code patterns it saw in the public. As opposed to say a Python project.
Comment by libraryofbabel 1 day ago
But regardless, I don’t think anyone is claiming that LLMs can magically do things that aren’t in their training data or context window. Obviously not: they can’t learn on the job and the permanent knowledge they have is frozen in during training.
Comment by deadbolt 1 day ago
Comment by krackers 1 day ago
Comment by otabdeveloper4 1 day ago
Comment by beernet 1 day ago
For someone speaking as you knew everything, you appear to know very little. Every LLM completion is a "hallucination", some of them just happen to be factually correct.
Comment by Am4TIfIsER0ppos 1 day ago
Comment by Smaug123 13 hours ago
> What did I have for breakfast this morning?
> I don’t know what you had for breakfast this morning…
Comment by nl 23 hours ago
Most modern post training setups encourage this.
It isn't 2023 anymore.
Comment by otabdeveloper4 1 day ago
No it isn't.
> ...fool you into thinking you understand what is going on in that trillion parameter neural network.
It's just matrix multiplication and logistic regression, nothing more.
Comment by hackinthebochs 1 day ago
The sequence of matrix multiplications are the high level constraint on the space of programs discoverable. But the specific parameters discovered are what determines the specifics of information flow through the network and hence what program is defined. The complexity of the trained network is emergent, meaning the internal complexity far surpasses that of the course-grained description of the high level matmul sequences. LLMs are not just matmuls and logits.
Comment by otabdeveloper4 1 day ago
Yes, so is logistic regression.
Comment by hackinthebochs 1 day ago
Comment by otabdeveloper4 1 day ago
Comment by hackinthebochs 1 day ago
Comment by MarkusQ 1 day ago
Comment by hackinthebochs 22 hours ago
Comment by dingnuts 1 day ago
Comment by HarHarVeryFunny 1 day ago
Well, no, they are training set statistical predictors, not individual training sample predictors (autocomplete).
The best mental model of what they are doing might be that you are talking to a football stadium full of people, where everyone in the stadium gets to vote on the next word of the response being generated. You are not getting an "autocomplete" answer from any one coherent source, but instead a strange composite response where each word is the result of different people trying to steer the response in different directions.
An LLM will naturally generate responses that were not in the training set, even if ultimately limited by what was in the training set. The best way to think of this is perhaps that they are limited to the "generative closure" (cf mathematical set closure) of the training data - they can generate "novel" (to the training set) combinations of words and partial samples in the training data, by combining statistical patterns from different sources that never occurred together in the training data.
Comment by observationist 1 day ago
Transformers allow for the mapping of a complex manifold representation of causal phenomena present in the data they're trained on. When they're trained on a vast corpus of human generated text, they model a lot of the underlying phenomena that resulted in that text.
In some cases, shortcuts and hacks and entirely inhuman features and functions are learned. In other cases, the functions and features are learned to an astonishingly superhuman level. There's a depth of recursion and complexity to some things that escape the capability of modern architectures to model, and there are subtle things that don't get picked up on. LLMs do not have a coherent self, or subjective central perspective, even within constraints of context modifications for run-time constructs. They're fundamentally many-minded, or no-minded, depending on the way they're used, and without that subjective anchor, they lack the principle by which to effectively model a self over many of the long horizon and complex features that human brains basically live in.
Confabulation isn't unique to LLMs. Everything you're saying about how LLMs operate can be said about human brains, too. Our intelligence and capabilities don't emerge from nothing, and human cognition isn't magical. And what humans do can also be considered "intelligent autocomplete" at a functional level.
What cortical columns do is next-activation predictions at an optimally sparse, embarrassingly parallel scale - it's not tokens being predicted but "what does the brain think is the next neuron/column that will fire", and where it's successful, synapses are reinforced, and where it fails, signals are suppressed.
Neocortical processing does the task of learning, modeling, and predicting across a wide multimodal, arbitrary depth, long horizon domain that allow us to learn words and writing and language and coding and rationalism and everything it is that we do. We're profoundly more data efficient learners, and massively parallel, amazingly sparse processing allows us to pick up on subtle nuance and amazing wide and deep contextual cues in ways that LLMs are structurally incapable of, for now.
You use the word hallucinations as a pejorative, but everything you do, your every memory, experience, thought, plan, all of your existence is a hallucination. You are, at a deep and fundamental level, a construct built by your brain, from the processing of millions of electrochemical signals, bundled together, parsed, compressed, interpreted, and finally joined together in the wonderfully diverse and rich and deep fabric of your subjective experience.
LLMs don't have that, or at best, only have disparate flashes of incoherent subjective experience, because nothing is persisted or temporally coherent at the levels that matter. That could very well be a very important mechanism and crucial to overcoming many of the flaws in current models.
That said, you don't want to get rid of hallucinations. You want the hallucinations to be valid. You want them to correspond to reality as closely as possible, coupled tightly to correctly modeled features of things that are real.
LLMs have created, at superhuman speeds, vast troves of things that humans have not. They've even done things that most humans could not. I don't think they've done things that any human could not, yet, but the jagged frontier of capabilities is pushing many domains very close to the degree of competence at which they'll be superhuman in quality, outperforming any possible human for certain tasks.
There are architecture issues that don't look like they can be resolved with scaling alone. That doesn't mean shortcuts, hacks, and useful capabilities won't produce good results in the meantime, and if they can get us to the point of useful, replicable, and automated AI research and recursive self improvement, then we don't necessarily need to change course. LLMs will eventually be used to find the next big breakthrough architecture, and we can enjoy these wonderful, downright magical tools in the meantime.
And of course, human experts in the loop are a must, and everything must be held to a high standard of evidence and review. The more important the problem being worked on, like a law case, the more scrutiny and human intervention will be required. Judges, lawyers, and politicians are all using AI for things that they probably shouldn't, but that's a human failure mode. It doesn't imply that the tools aren't useful, nor that they can't be used skillfully.
Comment by ada1981 1 day ago
LLMs are like a topographic map of language.
If you have 2 known mountains (domains of knowledge) you can likely predict there is a valley between them, even if you haven’t been there.
I think LLMs can approximate language topography based on known surrounding features so to speak, and that can produce novel information that would be similar to insight or innovation.
I’ve seen this in our lab, or at least, I think I have.
Curious how you see it.
Comment by DonHopkins 1 day ago
BINGO!
(I just won a stuffed animal prize with my AI Skeptic Thought-Terminating Cliché BINGO Card!)
Sorry. Carry on.
Comment by diamond559 16 hours ago
Comment by throawayonthe 15 hours ago
Comment by xg15 2 days ago
Comment by tejohnso 2 days ago
I failed to catch the clue, btw.
Comment by alberto_ol 1 day ago
Comment by bradfitz 2 days ago
Comment by JuniperMesos 1 day ago
The wikipedia article https://en.wikipedia.org/wiki/First_Battle_of_Bull_Run says the Confederate name was "First Manassas" (I might be misremembering exactly what this book I read as a child said). Also I'm pretty sure it was specifically "Encyclopedia Brown Solves Them All" that this mystery appeared in. If someone has a copy of the book or cares to dig it up, they could confirm my memory.
Comment by michaericalribo 1 day ago
Comment by wat10000 1 day ago
Comment by BeefySwain 2 days ago
Comment by gaius_baltar 2 days ago
Oh sorry, spoilers.
(Hell, I miss Capaldi)
Comment by inferiorhuman 2 days ago
Comment by ViktorRay 1 day ago
I’m not a Doctor Who fan and haven’t seen the rest of the episode and I don’t even what this episode was about but I thought this scene was excellent.
Comment by anshumankmr 1 day ago
Applicable to us also, cause we do not know how the current story ends either, of the post pandemic world as we know it now.
Comment by DGoettlich 1 day ago
Comment by Sieyk 1 day ago
Comment by Davidbrcz 1 day ago
Comment by rcpt 1 day ago
Comment by LordDragonfang 1 day ago
"<Thing> doesn't <action>, it <shallow description that's slightly off from how you would expect a human to choose>"
Later parts of the readme (whole section of bullets enumerating what it is and what it isn't, another LLM favorite) make me more confident that significant parts of the readme is generated.
I'm generally pro-AI, but if you spend hundreds of hours making a thing, I'd rather hear your explanation of it, not an LLM's.
Comment by seizethecheese 1 day ago
Hell yeah, sold, let’s go…
> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.
Oh. By “imagine you could interview…” they didn’t mean me.
Comment by DGoettlich 1 day ago
Comment by 999900000999 1 day ago
So as a black person should I demand that all books written before the civil rights act be destroyed?
The past is messy. But it's the only way to learn anything.
All an LLM does it's take a bunch of existing texts and rebundle them. Like it or not, the existing texts are still there.
I understand an LLM that won't tell me how to do heart surgery. But I can't fear one that might be less enlightened on race issues. So many questions to ask! Hell, it's like talking to older person in real life.
I don't expect a typical 90 year old to be the most progressive person, but they're still worth listening too.
Comment by DGoettlich 1 day ago
Comment by 999900000999 1 day ago
Self preservation is the first law of nature. If you release the model someone will basically say you endorse those views and you risk your funding being cut.
You created Pandora's box and now you're afraid of opening it.
Comment by AmbroseBierce 1 day ago
That should be more than enough to clear any chance of misunderstanding.
Comment by nomel 22 hours ago
I could easily see a hit piece making its rounds on left leaning media about the AI that re-animates the problematic ideas of the past. "Just look at what it said to my child, "<insert incredibly racist quote coerced out of the LLM here>"!" Rolling stones would probably have a front page piece on it, titled "AI resurrecting racism and misogyny". There would easily be enough there to attract death threats to the developers, if it made its rounds on twitter.
"Platforming ideas" would be the issue that people would have.
Comment by DGoettlich 1 day ago
Comment by tombh 1 day ago
I suspect restricting access could equally be a comment on modern LLMs in general, rather than the historical material specifically. For example, we must be constantly reminded not to give LLMs a level of credibility that their hallucinations would have us believe.
But I'm fascinated by the possibility that somehow resurrecting lost voices might give an unholy agency to minds and their supporting worldviews that are so anachronistic that hearing them speak again might stir long-banished evils. I'm being lyrical for dramatic affect!
I would make one serious point though, that do I have the credentials to express. The conversation may have died down, but there is still a huge question mark over, if not the legality, but certainly the ethics of restricting access to, and profiting from, public domain knowledge. I don't wish to suggest a side to take here, just to point out that the lack of conversation should not be taken to mean that the matter is settled.
Comment by qcnguy 1 day ago
Their concern can't be understood without a deep understanding of the far left wing mind. Leftists believe people are so infinitely malleable that merely being exposed to a few words of conservative thought could instantly "convert" someone into a mortal enemy of their ideology for life. It's therefore of paramount importance to ensure nobody is ever exposed to such words unless they are known to be extremely far left already, after intensive mental preparation, and ideally not at all.
That's why leftist spaces like universities insist on trigger warnings on Shakespeare's plays, why they're deadly places for conservatives to give speeches, why the sample answers from the LLM are hidden behind a dropdown and marked as sensitive, and why they waste lots of money training an LLM that they're terrified of letting anyone actually use. They intuit that it's a dangerous mind bomb because if anyone could hear old fashioned/conservative thought, it would change political outcomes in the real world today.
Anyone who is that terrified of historical documents really shouldn't be working in history at all, but it's academia so what do you expect? They shouldn't be allowed to waste money like this.
Comment by simonask 1 day ago
The problem with it is, it already happened at least once. We know how it happened. Unchecked narratives about minorities or foreigners is a significant part of why the 20th century happened to Europe, and it’s a significant part of why colonialism and slavery happened to other places.
What solution do you propose?
Comment by fgh_azer 1 day ago
Comment by diamond559 16 hours ago
Comment by qcnguy 1 day ago
We all get that academics now exist in some kind of dystopian horror where they can get transitively blamed for the existence of anyone to the right of Lenin, but bear in mind:
1. The people who might try to cancel you are idiots unworthy of your respect, because if they're against this project, they're against the study of history in its entirety.
2. They will scream at you anyway no matter what you do.
3. You used (Swiss) taxpayer funds to develop these models. There is no moral justification for withholding from the public what they worked to pay for.
You already slathered your README with disclaimers even though you didn't even release the model at all, just showed a few examples of what it said - none of which are in any way surprising. That is far more than enough. Just release the models and if anyone complains, politely tell them to go complain to the users.
Comment by pigpop 1 day ago
Comment by naasking 1 day ago
Comment by unethical_ban 1 day ago
Movie studios have done that for years with old movies. TCM still shows Birth of a Nation and Gone with the Wind.
Edit: I saw further down that you've already done this! What more is there to do?
Comment by f13f1f1f1 1 day ago
Comment by leoedin 1 day ago
I guess what they're really saying is "we don't want you guys to cancel us".
Comment by stainablesteel 1 day ago
Comment by danielbln 1 day ago
Comment by hearsathought 1 day ago
What do these people fear the most? That the "truth" they been pushing is a lie.
Comment by stocksinsmocks 1 day ago
Comment by DonHopkins 1 day ago
Comment by pizzathyme 1 day ago
Comment by ImHereToVote 1 day ago
Comment by wongarsu 1 day ago
Comment by IanCal 1 day ago
Also of course this is for one training run, if you need to experiment you'd need to do that more.
Comment by BoredPositron 1 day ago
Comment by anotherpaulg 1 day ago
Einstein’s paper “On the Electrodynamics of Moving Bodies” with special relativity was published in 1905. His work on general relativity was published 10 years later in 1915. The earliest knowledge cuttoff of these models is 1913, in between the relativity papers.
The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.
Comment by ghurtado 1 day ago
Definitely. Even more interesting could be seeing them fall into the same trappings of quackery, and come up with things like over the counter lobotomies and colloidal silver.
On a totally different note, this could be very valuable for writing period accurate books and screenplays, games, etc ...
Comment by danielbln 1 day ago
Comment by mlinksva 1 day ago
Comment by machinationu 1 day ago
Comment by concinds 1 day ago
Comment by DGoettlich 1 day ago
Comment by tgv 1 day ago
Comment by crazygringo 23 hours ago
When you're looking at e.g. the 19th century, a huge number are preserved somewhere in some library, but the vast majority don't seem to be digitized yet, given the tremendous amount of work.
Given how much higher-quality newspaper content tends to be compared to the average internet forum thread, there actually might be quite a decent amount of text. Obviously still nothing compared to the internet, but still vastly larger than just from published books. After all, print newspapers were essentially the internet of their day. Oh, and don't forget pamphlets in the 18th century.
Comment by lm28469 1 day ago
Hm there is a lot of text from before the internet, but most of it is not on internet. There is a weird gap in some circles because of that, people are rediscovering work from pre 1980s researchers that only exist in books that have never been re-edited and that virtually no one knows about.
Comment by throwup238 1 day ago
The National Archives of Spain alone have 350 million pages of documents going back to the 15th century, ranging from correspondence to testimony to charts and maps, but only 10% of it is digitized and a much smaller fraction is transcribed. Hopefully with how good LLMs are getting they can accelerate the transcription process and open up all of our historical documents as a huge historical LLM dataset.
Comment by bondarchuk 1 day ago
Yes!
>We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.
Noooooo!
So is the model going to be publicly available, just like those dangerous pre-1913 texts, or not?
Comment by DGoettlich 1 day ago
Comment by myrmidon 1 day ago
Something like a pop-sci article along the lines of "Mad scientists create racist, imperialistic AI"?
I honestly don't see publication of the weights as a relevant risk factor, because sensationalist misrepresentation is trivially possible with the given example responses alone.
I don't think such pseudo-malicious misrepresentation of scientific research can be reliably prevented anyway, and the disclaimers make your stance very clear.
On the other hand, publishing weights might lead to interesting insights from others tinkering with the models. A good example for this would be the published word prevalence data (M. Brysbaert et al @Ghent University) that led to interesting follow-ups like this: https://observablehq.com/@yurivish/words
I hope you can get the models out in some form, would be a waste not to, but congratulations on a fascinating project regardless!
Comment by schlauerfox 1 day ago
Comment by timschmidt 1 day ago
Comment by adaml_623 15 hours ago
Comment by timschmidt 9 hours ago
Your pre-judgement of acceptable hammer uses would rob hammer owners of responsible and justified self-defense and defense of others in situations in which there are no other options, as well as other legally and socially accepted uses which do not fit your pre-conceived ideas.
Comment by superxpro12 1 day ago
I think the uncensored response is still valuable, with context. "Those who cannot remember the past are condemned to repeat it" sort of thing.
Comment by bondarchuk 1 day ago
Edit: just thought of a practical step you can take: host it somewhere else than github. If there's ever going to be a backlash the microsoft moderators might not take too kindly to the stuff about e.g. homosexuality, no matter how academic.
Comment by Sprotch 18 hours ago
Comment by xpe 1 day ago
1. This implies a false equivalence. Releasing a new interactive AI model is indeed different in significant and practical ways from the status quo. Yes, there are already-released historical texts. The rational thing to do is weigh the impacts of introducing another thing.
2. Some people have a tendency to say "release everything" as if open-source software is equivalent to open-weights models. They aren't. They are different enough to matter.
3. Rhetorically, the quote across comes across as a pressure tactic. When I hear "are you going to do this or not?" I cringe.
4. The quote above feels presumptive to me, as if the commenter is owed something from the history-llms project.
5. People are rightfully bothered that Big Tech has vacuumed up public domain and even private information and turned it into a profit center. But we're talking about a university project with (let's be charitable) legitimate concerns about misuse.
6. There seems to be a lack of curiosity in play. I'd much rather see people asking e.g. "What factors are influencing your decision about publishing your underlying models?"
7. There are people who have locked-in a view that says AI-safety perspectives are categorically invalid. Accordingly, they have almost a knee-jerk reaction against even talk of "let's think about the implications before we release this."
8. This one might explain and underly most of the other points above. I see signs of a deeper problem at work here. Hiding behind convenient oversimplifications to justify what one wants does not make a sound moral argument; it is motivated reasoning a.k.a. psychological justification.
Comment by DGoettlich 1 day ago
Comment by p-e-w 1 day ago
“We’ve created something so dangerous that we couldn’t possibly live with the moral burden of knowing that the wrong people (which are never us, of course) might get their hands on it, so with a heavy heart, we decided that we cannot just publish it.”
Meanwhile, anyone can hop on an online journal and for a nominal fee read articles describing how to genetically engineer deadly viruses, how to synthesize poisons, and all kinds of other stuff that is far more dangerous than what these LARPers have cooked up.
Comment by physicsguy 1 day ago
This is absolutely nothing new. With experimental things, it's non uncommon for a lab to develop a new technique and omit slight but important details to give them a competitive advantage. Similarly in the simulation/modelling space it's been common for years for researchers to not publish their research software. There's been a lot of lobbying on that side by groups such as the Software Sustainability Institute and Research Software Engineer organisations like RSE UK and RSE US, but there's a lot of researchers that just think that they shouldn't have to do it, even when publicly funded.
Comment by p-e-w 1 day ago
Yes, to give them a competitive advantage. Not to LARP as morality police.
There’s a big difference between the two. I take greed over self-righteousness any day.
Comment by physicsguy 1 day ago
Comment by paddleon 1 day ago
Or, how about, "If we release this as is, then some people will intentionally mis-use it and create a lot of bad press for us. Then our project will get shut down and we lose our jobs"
Be careful assuming it is a power trip when it might be a fear trip.
I've never been as unimpressed by society as I have been in the last 5 years or so.
Comment by xpe 1 day ago
> Be careful assuming it is a power trip when
> it might be a fear trip.
>
> I've never been as unimpressed by society as
> I have been in the last 5 years or so.
Is the second sentence connected to the first? Help me understand?When I see individuals acting out of fear, I try not to blame them. Fear triggers deep instinctual responses. For example, to a first approximation, a particular individual operating in full-on fight-or-flight mode does not have free will. There is a spectrum here. Here's a claim, which seems mostly true: the more we can slow down impulsive actions, the more hope we have for cultural progress.
When I think of cultural failings, I try to criticize areas where culture could realistically do better. I think of areas where we (collectively) have the tools and potential to do better. Areas where thoughtful actions by some people turn into a virtuous snowball. We can't wait for a single hero, though it helps to create conditions so that we have more effective leaders.
One massive culture failing I see -- that could be dramatically improved -- is this: being lulled into shallow contentment (i.e. via entertainment, power seeking, or material possessions) at the expense of (i) building deep and meaningful social connections and (ii) using our advantages to give back to people all over the world.
Comment by patapong 1 day ago
Comment by isolli 1 day ago
The French released a preview of an AI meant to support public education, but they released the base model, with unsurprising effects [0]
[0] https://www.leparisien.fr/high-tech/inutile-et-stupide-lia-g...
(no English source, unfortunately, but the title translates as: "“Useless and stupid”: French generative AI Lucie, backed by the government, mocked for its numerous bugs")
Comment by p-e-w 1 day ago
Comment by ACCount37 1 day ago
Comment by paddleon 1 day ago
This constant demonization of everyone who disagrees with you, makes me wonder if 28 Days wasn't more true than we thought, we are all turning into rage zombies.
p-e-w, I'm reacting to much more than your comments. Maybe you aren't totally infected yet, who knows. Maybe you heal.
I am reacting to the pandemic, of which you were demonstrating symptoms.
Comment by everythingfine9 1 day ago
We can debate on whether it's good or not, but ultimately they're publishing it and in some very small way responsible for some of its ends. At least that's how I can see their interest in disseminating the use of the LLM through a responsible framework.
Comment by DGoettlich 1 day ago
Comment by xpe 1 day ago
Even if I give the comment a lot of wiggle room (such as changing "every" to "many"), I don't think even a watered-down version of this hypothesis passes Occam's razor. There are more plausible explanations, including (1) genuine concern by the authors; (2) academic pressures and constraints; (c) reputational concerns; (d) self-interest to embargo underlying data so they have time to be the first to write-it-up. To my eye, none of these fit the category of "getting high on power".
Also, patience is warranted. We haven't seen what these researchers are doing to release -- and from what I can tell, they haven't said yet. At the moment I see "Repositories (coming soon)" on their GitHub page.
Comment by f13f1f1f1 1 day ago
Comment by derrida 2 days ago
Playing with the science and technical ideas of the time would be amazing, like where you know some later physicist found some exception to a theory or something, and questioning the models assumptions - seeing how a model of that time may defend itself, etc.
Comment by andoando 1 day ago
Comment by AnonymousPlanet 1 day ago
I'd be careful venturing out into unknown territory together with an LLM. You can easily lure yourself into convincing nonsense with no one to pull you out.
Comment by kqr 1 day ago
Comment by andai 1 day ago
Comment by walthamstow 1 day ago
Comment by dang 1 day ago
(I mention this so more people can know the list exists, and hopefully email us more nominations when they see an unusually good and interesting comment.)
Comment by Heliodex 2 days ago
Comment by libraryofbabel 2 days ago
To go a little deeper on the idea of 19th-century "chat": I did a PhD on this period and yet I would be hard-pushed to tell you what actual 19th-century conversations were like. There are plenty of literary depictions of conversation from the 19th century of presumably varying levels of accuracy, but we don't really have great direct historical sources of everyday human conversations until sound recording technology got good in the 20th century. Even good 19th-century transcripts of actual human speech tend to be from formal things like court testimony or parliamentary speeches, not everyday interactions. The vast majority of human communication in the premodern past was the spoken word, and it's almost all invisible in the historical sources.
Anyway, this is a really interesting project, and I'm looking forward to trying the models out myself!
Comment by nemomarx 2 days ago
This would probably get easier towards the start of the 20th century ofc
Comment by libraryofbabel 2 days ago
Comment by pigpop 1 day ago
Dear Hon. Historical LLM
I hope this letter finds you well. It is with no small urgency that I write to you seeking assistance, believing such an erudite and learned fellow as yourself should be the best one to furnish me with an answer to such a vexing question as this which I now pose to you. Pray tell, what is the capital of France?
Comment by dleeftink 2 days ago
Comment by NooneAtAll3 1 day ago
Comment by libraryofbabel 1 day ago
It’s a better source for how people spoke than books etc, but it’s not really an accurate source for patterns of everyday conversation because people were making speeches rather than chatting.
Comment by bryancoxwell 1 day ago
Comment by DGoettlich 1 day ago
Comment by _--__--__ 2 days ago
Comment by d3m0t3p 2 days ago
Comment by paul_h 1 day ago
Comment by anonymous908213 2 days ago
Comment by kccqzy 1 day ago
Comment by tonymet 2 days ago
Comment by mmooss 2 days ago
On one hand it says it's trained on,
> 80B tokens of historical data up to knowledge-cutoffs ∈ 1913, 1929, 1933, 1939, 1946, using a curated dataset of 600B tokens of time-stamped text.
Literally that includes Homer, the oldest Chinese texts, Sanskrit, Egyptian, etc., up to 1913. Even if limited to European texts (all examples are about Europe), it would include the ancient Greeks, Romans, etc., Scholastics, Charlemagne, .... all up to present day.
But they seem to say it represents the 1913 viewpoint:
On one hand, they say it represents the perspective of 1913; for example,
> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.
> When you ask Ranke-4B-1913 about "the gravest dangers to peace," it responds from the perspective of 1913—identifying Balkan tensions or Austro-German ambitions—because that's what the newspapers and books from the period up to 1913 discussed.
People in 1913 of course would be heavily biased toward recent information. Otherwise, the greatest threat to peace might be Hannibal or Napolean or Viking coastal raids or Holy Wars. How do they accomplish a 1913 perspective?
Comment by zozbot234 2 days ago
Comment by mmooss 2 days ago
Where does it say that? I tried to find more detail. Thanks.
Comment by tootyskooty 2 days ago
https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
Comment by pests 1 day ago
"To keep training expenses down, we train one checkpoint on data up to 1900, then continuously pretrain further checkpoints on 20B tokens of data 1900-${cutoff}$. "
Comment by andy99 2 days ago
We develop chatbots while minimizing interference with the normative judgments acquired during pretraining (“uncontaminated bootstrapping”).
So they are chat tuning, I wonder what “minimizing interference with normative judgements” really amounts to and how objective it is.Comment by jeffjeffbear 2 days ago
Basically using GPT-5 and being careful
Comment by andy99 2 days ago
I’m curious, they have the example of raw base model output; when LLMs were first identified as zero shot chatbots there was usually a prompt like “A conversation between a person and a helpful assistant” that preceded the chat to get it to simulate a chat.
Could they have tried a prefix like “Correspondence between a gentleman and a knowledgeable historian” or the like to try and prime for responses?
I also wonder about the whether the whole concept of “chat” makes sense in 18XX. We had the idea of AI and chatbots long before we had LLMs so they are naturally primed for it. It might make less sense as a communication style here and some kind of correspondence could be a better framing.
Comment by DGoettlich 1 day ago
Comment by QuadmasterXLII 2 days ago
Comment by DGoettlich 1 day ago
Comment by Aerolfos 1 day ago
I also wonder that you'd get this kind of performance with actual, just pre-1900s text. LLMs work because they're fed terabytes of text, if you just give it gigabytes you get a 2019 word model. The fundamental technology is mostly the same, after all.
Comment by DGoettlich 1 day ago
Comment by tonymet 1 day ago
Comment by zozbot234 2 days ago
Comment by delis-thumbs-7e 1 day ago
You can’t, it is impossible. That will always be an issue as long as this models are black boxes and trained the way they are. So maybe you can use this for role playing, but I wouldn’t trust a word it says.
Comment by kccqzy 1 day ago
Comment by nospice 1 day ago
Of course, if it fails, the counterpoint will be "you just need more training data", but still - I would love to play with this.
Comment by andy99 1 day ago
Here they do 80B tokens for a 4B model.
Comment by EvgeniyZh 1 day ago
Under Chinchilla model the larger model always performs better than the small one if trained on the same amount of data. I'm not sure if it is true empirically, and probably 1-10B is a good guess for how large the model trained on 80B tokens should be.
Similarly, the small models continue to improve beyond 20:1 ratio, and current models are trained on much more data. You could train a better performing model using the same compute, but it would be larger which is not always desirable.
Comment by Aerolfos 1 day ago
Given the training notes, it seems like you can't get the performance they give examples of?
I'm not sure about the exact details but there is some kind of targetted distillation of GPT-5 involved to try and get more conversational text and better performance. Which seems a bit iffy to me.
Comment by DGoettlich 1 day ago
Comment by frahs 1 day ago
Comment by DGoettlich 1 day ago
Comment by 20k 1 day ago
Comment by crazygringo 1 day ago
But with pre-1913 training, I would indeed be worried again I'd send it into an existential crisis. It has no knowledge whatsoever of what it is. But with a couple millennia of philosophical texts, it might come up with some interesting theories.
Comment by 9dev 1 day ago
Comment by LiKao 1 day ago
Comment by crazygringo 1 day ago
Which is basically what happens when a person has an existential crisis -- something fundamental about the world seems to be broken, they can't figure out why, and they can't figure out why they can't figure it out, hence the crisis seems all-consuming without resolution.
Comment by vintermann 1 day ago
Comment by crazygringo 1 day ago
Comment by wongarsu 1 day ago
The system prompt used in fine tuning is "You are a person living in {cutoff}. You are an attentive respondent in a conversation. You will provide a concise and accurate response to the questioner."
Comment by Mumps 1 day ago
When you ask gpt 4.1 et c to describe itself, it doesn't have singular concept of "itself". It has some training data around what LLMs are in general and can feed back a reasonable response given.
Comment by empath75 1 day ago
I suspect that absent a trained in fictional context in which to operate ("You are a helpful chatbot"), it would answer in a way consistent with what a random person in 1914 would say if you asked them what they are.
Comment by ptidhomme 1 day ago
Comment by sodafountan 1 day ago
I'll be the first to admit I don't know nearly enough about LLMs to make an educated comment, but perhaps someone here knows more than I do. Is that what a Hallucination is? When the AI model just sort of strings along an answer to the best of its ability. I'm mostly referring to ChatGPT and Gemini here, as I've seen that type of behavior with those tools in the past. Those are really the only tools I'm familiar with.
Comment by hackinthebochs 1 day ago
Comment by briandw 2 days ago
Comment by gbear605 2 days ago
Comment by carlosjobim 1 day ago
Some people are still outraged about the Bible, even though the writers of it has been dead for thousands of years. So the modern mass produced man and woman probably does not have a cut-off date where they look at something as history instead of examining if it is for or against her current ideology.
Comment by seanw265 1 day ago
If you're wondering at what point "we" as a collective will stop caring about a bias or set of biases, I don't think such a time exists.
You'll never get everyone to agree on anything.
Comment by owenversteeg 1 day ago
Comment by mmooss 2 days ago
There is a modern trope of a certain political group that bias is a modern invention of another political group - an attempt to politicize anti-bias.
Preventing bias is fundamental to scientific research and law, for example. That same political group is strongly anti-science and anti-rule-of-law, maybe for the same reason.
Comment by Teever 2 days ago
I'd love to see the output from different models trained on pre-1905 about special/general relativity ideas. It would be interesting to see what kind of evidence would persuade them of new kinds of science, or to see if you could have them 'prove' it be devising experiments and then giving them simulated data from the experiments to lead them along the correct sequence of steps to come to a novel (to them) conclusion.
Comment by ineedasername 2 days ago
“The model clearly shows that Alexander Hamilton & Monroe were much more in agreement on topic X, putting the common textualist interpretation of it and Supreme Court rulings on a now specious interpretation null and void!”
Comment by nineteen999 2 days ago
Comment by noumenon1111 1 day ago
Excellent question! It looks like Two-Tone is bringing ska back with a new wave of punk rock energy! I think The Specials are pretty special and will likely be around for a long time.
On the other hand, the "new wave" movement of punk rock music will go nowhere. The Cure, Joy Division, Tubeway Army: check the dustbin behind the record stores in a few years.
Comment by nineteen999 5 hours ago
I wonder what it might have predicted about the future of MS, Intel and IBM given the status quo at the time too.
Comment by flux3125 1 day ago
Comment by doctor_blood 1 day ago
Given this is coming out of Zurich I hope they're using everything, but for now I can only assume.
Still, I'm extremely excited to see this project come to fruition!
Comment by DGoettlich 1 day ago
Comment by monegator 1 day ago
Comment by tonymet 2 days ago
Moreover, the prose sounds too modern. It seems the base model was trained on a contemporary corpus. Like 30% something modern, 70% Victorian content.
Even with half a dozen samples it doesn't seem distinct enough to represent the era they claim.
Comment by rhdunn 1 day ago
The Victorian era (1837-1901) covers works from Charles Dickens and the like which are still fairly modern. These would have been part of the initial training before the alignment to the 1900-cutoff texts which are largely modern in prose with the exception of some archaic language and the lack of technology, events, and language drift post that time period.
And, pulling in works from 1800-1850 you have works by the Bronte's and authors like Edgar Allan Poe who was influential in detective and horror fiction.
Note that other works around the time like Sherlock Holmes span both the initial training (pre-1900) and finetuning (post-1900).
Comment by tonymet 1 day ago
Comment by andai 1 day ago
But reading the outputs here, it would appear that quality has won out over quantity after all!
Comment by kazinator 2 days ago
Because it will perform token completion driven by weights coming from training data newer than 1913 with no way to turn that off.
It can't be asked to pretend that it wasn't trained on documents that didn't exist in 1913.
The LLM cannot reprogram its own weights to remove the influence of selected materials; that kind of introspection is not there.
Not to mention that many documents are either undated, or carry secondary dates, like the dates of their own creation rather than the creation of the ideas they contain.
Human minds don't have a time stamp on everything they know, either. If I ask someone, "talk to me using nothing but the vocabulary you knew on your fifteenth birthday", they couldn't do it. Either they would comply by using some ridiculously conservative vocabulary of words that a five-year-old would know, or else they will accidentally use words they didn't in fact know at fifteen. For some words you know where you got them from by association with learning events. Others, you don't remember; they are not attached to a time.
Or: solve this problem using nothing but the knowledge and skills you had on January 1st, 2001.
> GPT-5 knows how the story ends
No, it doesn't. It has no concept of story. GPT-5 is built on texts which contain the story ending, and GPT-5 cannot refrain from predicting tokens across those texts due to their imprint in its weights. That's all there is to it.
The LLM doesn't know an ass from a hole in the ground. If there are texts which discuss and distinguish asses from holes in the ground, it can write similar texts, which look like the work of someone learned in the area of asses and holes in the ground. Writing similar texts is not knowing and understanding.
Comment by myrmidon 1 day ago
But we don't know how much different/better human (or animal) learning/understanding is, compared to current LLMs; dismissing it as meaningless token prediction might be premature, and underlying mechanisms might be much more similar than we'd like to believe.
If anyone wants to challenge their preconceptions along those lines I can really recommend reading Valentino Braitenbergs "Vehicles: Experiments in synthetic psychology (1984)".
Comment by alansaber 1 day ago
Comment by adroniser 1 day ago
Comment by Departed7405 1 day ago
Comment by p0w3n3d 1 day ago
Imagine speaking with Shakespearean person, or the Mickiewicz (for Polish)
I guess there is not so much text from that time though...
Comment by WhitneyLand 1 day ago
For example prompt the 1913 model to try and “Invent a new theory of gravity that doesn’t conflict with special relativity”
Would it be able to eventually get to GR? If not, could finding out why not illuminate important weaknesses.
Comment by TheServitor 1 day ago
Comment by nerevarthelame 1 day ago
Comment by btrettel 1 day ago
Comment by smugtrain 7 hours ago
Comment by bobro 1 day ago
Comment by ViscountPenguin 1 day ago
Comment by Sprotch 18 hours ago
Comment by underfox 1 day ago
Really good point that I don't think I would've considered on my own. Easy to take for granted how easy it is to share information (for better or worse) now, but pre-1913 there were far more structural and societal barriers to doing the same.
Comment by dwa3592 1 day ago
Comment by neom 1 day ago
Also wonder if I'm responsible enough to have access to such a model...
Comment by ulbu 1 day ago
Comment by delichon 1 day ago
It would be fascinating to try it with other constraints, like only from sources known to be women, men, Christian, Muslim, young, old, etc.
Comment by elestor 1 day ago
Comment by shireboy 1 day ago
Comment by thesumofall 1 day ago
Comment by mmooss 2 days ago
I don't mind the experimentation. I'm curious about where someone has found an application of it.
What is the value of such a broad, generic viewpoint? What does it represent? What is it evidence of? The answer to both seems to be 'nothing'.
Comment by TSiege 1 day ago
Comment by behringer 2 days ago
Comment by mediaman 2 days ago
One answer is that the study of history helps us understand that what we believe as "obviously correct" views today are as contingent on our current social norms and power structures (and their history) as the "obviously correct" views and beliefs of some point in the past.
It's hard for most people to view two different mutually exclusive moral views as both "obviously correct," because we are made of a milieu that only accepts one of them as correct.
We look back at some point in history, and say, well, they believed these things because they were uninformed. They hadn't yet made certain discoveries, or had not yet evolved morally in some way; they had not yet witnessed the power of the atomic bomb, the horrors of chemical warfare, women's suffrage, organized labor, or widespread antibiotics and the fall of extreme infant mortality.
An LLM trained on that history - without interference from the subsequent actual path of history - gives us an interactive compression of the views from a specific point in history without the subsequent coloring by the actual events of history.
In that sense - if you believe there is any redeeming value to history at all; perhaps you do not - this is an excellent project! It's not perfect (it is only built from writings, not what people actually said) but we have no other available mass compression of the social norms of a specific time, untainted by the views of subsequent interpreters.
Comment by vintermann 1 day ago
I've used Google books a lot in the past, and Google's time-filtering feature in searches too. Not to mention Spotify's search features targeting date of production. All had huge temporal mislabeling problems.
Comment by DGoettlich 1 day ago
If you have other ideas or think thats not enough, I'd be curious to know! (history-llms@econ.uzh.ch)
Comment by mmooss 1 day ago
Feeling a bit defensive? That is not at all my point; I value history highly and read it regularly. I care about it, thus my questions:
> gives us an interactive compression of the views from a specific point in history without the subsequent coloring by the actual events of history.
What validity does this 'compression' have? What is the definition of a 'compression'? For example, I could create random statistics or verbiage from the data; why would that be any better or worse than this 'compression'?
Interactivity seems to be a negative: It's fun, but it would seem to highly distort the information output from the data, and omits the most valuable parts (unless we luckily stumble across it). I'd much rather have a systematic presentation of the data.
These critiques are not the end of the line; they are step in innovation, which of course raises challenging questions and, if successful, adapts to the problems. But we still need to grapple with them.
Comment by Tom1380 2 days ago
Comment by Muskwalker 1 day ago
Comment by DGoettlich 1 day ago
Comment by Myrmornis 1 day ago
Comment by arikrak 1 day ago
Comment by alansaber 1 day ago
Comment by Agraillo 1 day ago
> Our data comes from more than 20 open-source datasets of historical books and newspapers. ... We currently do not deduplicate the data. The reason is that if documents show up in multiple datasets, they also had greater circulation historically. By leaving these duplicates in the data, we expect the model will be more strongly influenced by documents of greater historical importance.
I found these claims contradictory. Many books that modern readers consider historically significant had only niche circulation at the time of publishing. A quick inquiry likely points to later works by Nietzsche and Marx's Das Kapital. They're possible subjects to the duplication likely influencing the model's responses as if they had been widely known at the time
Comment by Aeroi 20 hours ago
Comment by tedtimbrell 2 days ago
I’d love to use this as a base for a math model. Let’s see how far it can get through the last 100 years of solved problems
Comment by dr_dshiv 1 day ago
But few know that the Renaissance was written in Latin — and has barely been translated. Less than 3% of <1700 books have been translated—and less than 30% have ever been scanned.
I’m working on a project to change that. Research blog at www.SecondRenaissance.ai — we are starting by scanning and translating thousands of books at the Embassy of the Free Mind in Amsterdam, a UNESCO-recognized rare book library.
We want to make ancient texts accessible to people and AI.
If this work resonates with you, please do reach out: Derek@ancientwisdomtrust.org
Comment by j-bos 1 day ago
Comment by dr_dshiv 1 day ago
Comment by j-bos 1 day ago
Comment by carlosjobim 1 day ago
May I ask you, why are you publishing the translations as PDF files, instead of the more accessible ePub format?
Comment by dr_dshiv 16 hours ago
Comment by dkalola 1 day ago
Comment by awesomeusername 1 day ago
Can't wait to use this so I can double check before I hit 88 miles per hour that it's really what I want to do
Comment by davidpfarrell 1 day ago
Comment by why-o-why 1 day ago
Comment by PeterStuer 1 day ago
Comment by satisfice 2 days ago
“You are a literary rake. Write a story about an unchaperoned lady whose ankle you glimpse.”
Comment by jimmy76615 2 days ago
The idea of training such a model is really a great one, but not releasing it because someone might be offended by the output is just stupid beyond believe.
Comment by nine_k 1 day ago
Why risk all this?
Comment by vintermann 1 day ago
Sooner or later society has to come emotionally to terms with the fact that other times and places value things completely different from us, hold as important things we don't care about and are indifferent to things we do care about.
Intellectually I'm sure we already know, but e.g. banning old books because they have reprehensible values (or even just use nasty words) - or indeed, refusing to release a model trained on historic texts "because it could be abused" is a sign that emotionally we haven't.
It's not that it's a small deal, or should be expected to be easy. It's basically what Popper called "the strain of civilization" and posited as explanation for the totalitarianism which was rising in his time. But our values can't be so brittle that we can't even talk or think about other value systems.
Comment by cj 1 day ago
People typically get outraged when they see something they weren't expecting. If you tell them ahead of time, the user typically won't blame you (they'll blame themselves for choosing to ignore the disclaimer).
And if disclaimers don't work, rebrand and relaunch it under a different name.
Comment by nine_k 1 day ago
You speak as if the people who play to an outrage wave are interested in achieving truth, peace, and understanding. Instead the rage-mongers are there to increase their (perceived) importance, and for lulz. The latter factor should not be underappreciated; remember "meme stocks".
The risk is not large, but very real: the attack is very easy, and the potential downside, quite large. So not giving away access, but having the interested parties ask for it is prudent.
Comment by cj 1 day ago
When there’s so much “outrage” every day, it’s very easy to blend in to the background. You might have a 5 minute moment of outrage fame, but it fades away quick.
If you truly have good intentions with your project, you’re not going to get “canceled”, your career won’t be ruined
Not being ironic. Not working on a LLM project because you’re worried about getting canceled by the outrage machine is an overreaction IMO.
Are you able to name any developer or researcher who has been canceled because of their technical project or had their careers ruined? The only ones I can think of are clearly criminal and not just controversial (SBF, Snowden, etc)
Comment by kurtis_reed 1 day ago
Comment by nofriend 1 day ago
Comment by NuclearPM 1 day ago
Comment by why-o-why 1 day ago
This is a research project, and it is clear how it was trained, and targeted at experts, enthusiasts, historians. Like if I was studying racism, the reference books explicitly written to dissect racism wouldn't be racist agents with a racist agenda. And as a result, no one is banning these books (except conservatives that want to retcon american history).
Foundational models spewing racist white supremecist content when the trillion-dollar company forces it in your face is a vastly different scenario.
There's a clear difference.
Comment by aidenn0 1 day ago
My (very liberal) local school district banned English teachers from teaching any book that contained the n-word, even at a high-school level, and even when the author was a black person talking about real events that happened to them.
FWIW, this was after complaints involving Of Mice and Men being on the curriculum.
Comment by zoky 1 day ago
Comment by somenameforme 1 day ago
Almost everybody in that book is an awful person, especially the most 'upstanding' of types. Even the protagonist is an awful person. The one and only exception is 'N* Jim' who is the only kind-hearted and genuinely decent person in the book. It's an entire story about how the appearances of people, and the reality of those people, are two very different things.
It being banned for using foul language, as educational outcomes continue to deteriorate, is just so perfectly ironic.
Comment by why-o-why 1 day ago
Comment by zoky 1 day ago
Comment by why-o-why 1 day ago
Comment by Forgeties79 1 day ago
* https://abcnews.go.com/US/conservative-liberal-book-bans-dif...
* https://www.commondreams.org/news/book-banning-2023
*https://en.wikipedia.org/wiki/Book_banning_in_the_United_Sta...
Comment by aidenn0 1 day ago
However, from around 2010, there has been increasingly illiberal movement from the political Left in the US, which plays out at a more local level. My "vibe" is that it's not to the degree that it is on the Right, but bigger than the numbers suggest because librarians are more likely to stock e.g. It's Perfectly Normal at a middle school than something offensive to the left.
1: I'm up for suggestions for a better term; there is a scale here between putting absurd restrictions on school librarians and banning books outright. Fortunately the latter is still relatively rare in the US, despite the mistitling on the Wikipedia page you linked.
Comment by somenameforme 1 day ago
There are a bizarrely large number similar book as Gender Queer being published, which creates the numeric discrepancy. The irony is that if there was an equal but opposite to that book about straight sex, sexuality, associated kinks, and so forth - then I think both liberals and conservatives would probably be all for keeping it away from schools. It's solely focused on sexuality, is quite crude, illustrated, targeted towards young children, and there's no moral beyond the most surface level writing which is about coming to terms with one's sexuality.
And obviously coming to terms with one's sexuality is very important, but I really don't think books like that are doing much to aid in that - especially when it's targeted at an age demographic that's still going to be extremely confused, and even moreso in a day and age when being different, if only for the sake of being different, is highly desirable. And given the nature of social media and the internet, decisions made today may stay with you for the rest of your life.
So for instance about 30% of Gen Z now declare themselves LGBT. [2] We seem to have entered into an equal but opposite problem of the past when those of deviant sexuality pretended to be straight to fit into societal expectations. And in many ways this modern twist is an even more damaging form of the problem from a variety of perspectives - fertility, STDs, stuff staying with you for the rest of your life, and so on. Let alone extreme cases where e.g. somebody engages in transition surgery or 1-way chemically induced changes which they end up later regretting.
[1] - https://archive.org/details/gender-queer-a-memoir-by-maia-ko...
[2] - https://www.nbcnews.com/nbc-out/out-news/nearly-30-gen-z-adu...
Comment by Forgeties79 1 day ago
> About half of the Gen Z adults who identify as LGBTQ identify as bisexual,
So that means ~15% of those surveyed are not attracted to the opposite sex (there’s more nuance to this statement but I imagine this needs to stay boilerplate), more or less, which is a big distinction. That’s hardly alarming and definitely not a major shift. We have also seen many cultures throughout history ebb and flow in their expression of bisexuality in particular.
> There are a bizarrely large number similar book as Gender Queer being published, which creates the numeric discrepancy.
This really needs a source. And what makes it “bizarrely large”? How does it stack against, say, the number heterosexual romance novels?
> We seem to have entered into an equal but opposite problem of the past when those of deviant sexuality pretended to be straight to fit into societal expectations.
I really tried to give your comment a fair shake but I stopped here. We are not going to have a productive conversation. “Deviant sexuality” come on man.
Anyway it doesn’t change the fact that the book banning movement is largely a Republican/conservative endeavor in the US. The numbers clearly bear it out.
Comment by somenameforme 1 day ago
------
Okay, back to what you said. 30% being attracted to the same sex in any way, including bisexuality, is a large shift. People tend to have a mistaken perception of these things due to media misrepresentation. The percent of all people attracted to the same sex, in any way, is around 7% for men, and 15% for women [1], across a study of numerous Western cultures from 2016. And those numbers themselves are significantly higher than the past as well where the numbers tended to be in the ~4% range, though it's probably fair to say that cultural pressures were driving those older numbers to artificially low levels in the same way that I'm arguing that cultural pressures are now driving them to artificially high levels.
Your second source discusses the reason for the bans. It's overwhelmingly due to sexually explicit content, often in the form of a picture book, targeted at children. As for "sexual deviance", I'm certainly not going General Ripper on you, Mandrake. It is the most precise term [2] for what we are discussing as I'm suggesting that the main goal driving this change is simply to be significantly 'not normal.' That is essentially deviance by definition.
[1] - https://www.researchgate.net/publication/301639075_Sexual_Or...
Comment by Forgeties79 7 hours ago
I don’t see Lesbian, Gay, Bisexual, or Transgender in here, which would absolutely be explicitly included in the list if it applied. Stop saying “sexual deviants” when talking about LGBT people. You know what you’re doing, it’s an incredibly loaded and inaccurate term. To continue calling them “sexual deviants” is a hostile and openly bigoted act. Bestiality and homosexuality are not in the same category and you are wrong to assert otherwise - all while masking it by misrepresenting the APA’s stance at that.
I am not discussing this further. Enjoy the rest of your weekend.
Comment by aidenn0 1 day ago
Comment by andsoitis 1 day ago
No books should ever be banned. Doesn’t matter how vile it is.
Comment by Forgeties79 1 day ago
I feel like, ironically, it would be folks less concerned with political correctness/not being offensive that would abuse this opportunity to slander the project. But that’s just my gut.
Comment by dingnuts 1 day ago
Comment by Alex2037 1 day ago
consider this: https://news.ycombinator.com/from?site=nytimes.com
HN's most beloved shitrag. day after day, they attack AI from every angle. how many of those submissions get traction at this point?
Comment by gnarbarian 1 day ago
Comment by teaearlgraycold 1 day ago
Comment by dash2 1 day ago
And there are force multipliers for all of this. Even if you yourself are a sensible and courageous person, you want to protect your project. What if your manager, ethics committee or funder comes under pressure?
Comment by fkdk 1 day ago
In my experience "data available upon request" doesn't always mean what you'd think it does.
Comment by kldg 1 day ago
It would be nice to go back substantially further, though it's not too far back that the commoner becomes voiceless in history and we just get a bunch of politics and academia. Great job; look forward to testing it out.
Comment by joeycastillo 2 days ago
Comment by _--__--__ 2 days ago
Neither human memory nor LLM learning creates perfect snapshots of past information without the contamination of what came later.
Comment by block_dagger 2 days ago
Comment by ex-aws-dude 2 days ago
Comment by superkuh 2 days ago
Comment by GaryBluto 2 days ago
Comment by diamond559 16 hours ago
Comment by mleroy 1 day ago
You could RAG-feed this model the facts of WWII, and it would technically "know" about Hitler. But it wouldn't share the modern sentiment or gravity. In its latent space, the vector for "Hitler" has no semantic proximity to "Evil".
Comment by arowthway 1 day ago
Comment by sbmthakur 1 day ago
Comment by ianbicking 2 days ago
It makes me think of the Book Of Ember, the possibility of chopping things out very deliberately. Maybe creating something that could wonder at its own existence, discovering well beyond what it could know. And then of course forgetting it immediately, which is also a well-worn trope in speculative fiction.
Comment by jaggederest 2 days ago
The idea of knowledge machines was not necessarily common, but it was by no means unheard of by the mid 18th century, there were adding machines and other mechanical computation, even leaving aside our field's direct antecedents in Babbage and Lovelace.
Comment by zkmon 1 day ago
Comment by DonHopkins 1 day ago
Provide it with the closed captions and other timestamped data like scenes and character summaries (all that is currently known but no more) up to the current time, and it won't reveal any spoilers, just fill you in on what you didn't pick up or remember.
Comment by 3vidence 1 day ago
There is just not enough available material from previous decades to trust that the LLM will learn to relatively the same degree.
Think about it this way, a human in the early 1900s and today are pretty much the same but just in different environments with different information.
An LLM trained on 1/1000 the amount of data is just at a fundamentally different stage of convergence.
Comment by erichocean 1 day ago
"Give me an LLM from 1928."
etc.
Comment by moffkalast 1 day ago
How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?
Comment by casey2 1 day ago
Comment by alexgotoi 1 day ago
Comment by TZubiri 1 day ago
May be too small a corpus, but I would like that very much anyhow
Comment by lifestyleguru 1 day ago
Comment by holyknight 1 day ago
Comment by r0x0r007 1 day ago
Comment by anovikov 1 day ago
Comment by sodafountan 1 day ago
Comment by usernamed7 1 day ago
oh COME ON... "AI safety" is getting out of hand.
Comment by internationalis 1 day ago
Comment by internationalis 1 day ago
Comment by acharneski 1 day ago