Show HN: Learning a Language Using Only Words You Know
Posted by simedw 4 days ago
A proof-of-concept language learning app that uses LLMs to generate definitions of unknown words using only previously mastered vocabulary.
Comments
Comment by englishcat 20 hours ago
On the other hand, the Chinese writing system is logographic (or ideographic), unlike the English system which is phonetic. The most basic characters, such as 日 (sun), 月 (moon), and 山 (mountain), are essentially graphics (or pictures) of the objects themselves. that makes them very suitable for being represented by images. The emoji you are using is also very good.
I believe this method should be very effective for beginners in Chinese. However, once you have mastered the basic Chinese characters, you can learn about the structure of Chinese characters and then continue reading more materials to expand your vocabulary.
The real challenge is to expand your vocabulary through extensive reading, i'm actually working on a tool to solve this specific problem (https://lingoku.ai/learn-chinese), If you are reading English, it will insert Chinese text for you, if your are reading Chinese text, it will translate the text from Chinese to English then inject Chinese words into the translated text, thus improving your vocabulary while reading.
Comment by bisonbear 15 hours ago
At least for me, there's large value in consuming bigger volumes of Chinese to get me used to pattern-matching on the characters, as opposed to only reading a smaller amount of harder characters that I'm less likely to actually encounter
Comment by simedw 11 hours ago
Comment by dylanzhangdev 1 day ago
Comment by simedw 11 hours ago
Comment by gcanyon 1 day ago
More: https://triviumpursuit.com/childrens-books-in-words-of-one-s...
Comment by bitwize 1 day ago
Comment by jtokoph 4 days ago
I’m trying to learn to speak Chinese and not read it yet. The issue is most of the language learning apps have a focus on characters. I feel like I just want to see the pinyin. Maybe I don’t know what I need, but I haven’t found the right tool.
Comment by andai 1 day ago
You listen to audio you don't understand yet, and over time your brain begins to pick up the patterns. It takes a lot of time but you can do it in the background, because that processing happens subconsciously. So you can get that time "for free".
I learned it from this guy https://alljapanesealltheti.me/index.html
But he got it from linguist Stephen Krashen and his Input Hypothesis of language acquisition. (i.e. that the way babies and kids learn languages, thru osmosis, works for adults too.)
I think the ideal solution is somewhere in the middle, starting with something like Pimsleur which is the same idea (audio and repetition) but more structured and focused, to give you that "seed" of vocabulary and grammar, before you flesh it out with the "long tail" of the language.
Comment by cblum 1 day ago
The gist of those methods is mass input + create SRS cards for sentences where only one word or grammar pattern is unfamiliar to you.
A similar but more relaxed approach is ALG (automatic language growth), where you start from very basic input with lots of visual aids and let the language “wash over you”: no taking notes, no creating flashcards, no dictionary lookups. Sounds crazy, but it works for a lot of people. It’s the method behind Dreaming Spanish, which was inspired by the teaching method at the AUA language school in Bangkok, where Dr. J Marvin Brown used Stephen Krashen’s ideas to create a Natural Approach course to teach foreigners Thai from zero to fluency.
Comment by armenarmen 1 day ago
Comment by bpev 1 day ago
For adults learning a language, I think you need 3 things to be most efficient. You need to learn the grammar rules/structure, you need vocabulary, and you need lots and lots of content. The specificity of Pimsleur I think is a major blocker. It lacks both vocabulary and content, and there is often a better resource for explaining grammar. I guess maybe the first unit of each Pimsleur course is pretty ok for getting used to the mouthfeel of a language, though.
For Spanish, I got far more out of languagetransfer.org, which helped me understand the concepts of the language much more, and dreaming.com, which gave me lots of content. For Chinese, I haven't found a course I like, but I still think I got more from drilling characters (I made my own app, but something like hanzihero or just an HSK/TOCFL Anki deck is probably good) and using graded readers. I think spoken-first in Chinese is a little bit of a trap, because it's easier to remember things with the written characters, when the relationships between words is a bit more clear.
edit: oh also sidenote, it's been a long time since I used it, but iirc, the Mandarin one is particularly outdated (eg talks about using a phone book) and uses a Beijing dialect, so everyone in Taipei made fun of me the first time I went there.
Comment by cblum 1 day ago
Comment by simedw 4 days ago
Comment by SuperNinKenDo 1 day ago
Comment by nubg 1 day ago
Comment by sbinnee 1 day ago
Comment by NiloCK 1 day ago
Comment by bryanhogan 1 day ago
My first concerns though:
1. How can the system know which words I already know.
2. To what degree will I misunderstand the meaning of words.
3. Somewhat related to 2, how inaccurate will be description / explanation of words be.
Comment by simedw 10 hours ago
1. How does it know which words I already know? It doesn’t automatically. You provide that set. For example, if you’ve completed HSK 1, you can paste the HSK 1 word list into LangSeed and mark those as "known". From there, new explanations are constrained to that vocabulary. You can also paste in real text and mark the easy words as known, though that’s a bit more manual.
2. How much might I misunderstand word meanings? Depends on how advanced the vocab is and how large your known-word set is. I think of this as building intuition rather than giving dictionary-precise definitions. As you see words in more contexts, that intuition sharpens. This is just my experience from testing it over the last couple of weeks.
3. How inaccurate are the explanations? I tested it on Swedish (my native language). There are occasional awkward or slightly odd phrasings, but it’s rarely outright wrong.
Comment by bisonbear 16 hours ago
I haven't dug into the github repo but I'm curious if by "guided decoding" you're referring to logit bias (which I use), or actual token blocking? Interested to know how this works technically.
(shameless self plug) I've actually been solving a similar problem for Mandarin learning - but from the comprehensible input side rather than the dictionary side:
https://koucai.chat - basically AI Mandarin penpals that write at your level
My approach uses logit bias to generate n+1 comprehensible input (essentially artificially raising the probability of the tokens that correspond to the user's vocabulary). Notably I didn't add the concept of a "regeneration loop" (otherwise there would be no +1 in N+1) but think it's a good idea.
Really curious about the grammar issues you mentioned - I also experimented with the idea of an AI-enhanced dictionary (given that the free chinese-english dictionary I have is lacking good examples) but determined that the generated output didn't meet my quality standards. Have you found any models that handle measure words reliably?
Comment by andai 1 day ago
Comment by bisonbear 16 hours ago
Comment by gamander 19 hours ago
Comment by mog_dev 1 day ago
Comment by simedw 11 hours ago
Comment by closetkantian 1 day ago