Char Card update. That took entirely too long (over 8000 cards). Look up card, get tags, make file, strip unnecessary sections. Then write a script to uniformly format, remove the dead tags section and inject the new section. Along with finding uncensored versions of a number of cards.
Since I've been using SillyTavern to manage things (it offers logs character management and even image generation features) I've found that some cards don't import even with the tags present. Curiously it seems like empty fields will cause the importer to barf and stop. So if you have a blank 'personality' field, it won't get to the tags. Removing useless fields seems to resolve that, or at least i hope so.
Initial tests are promising. But finding a lot of v1 cards where most of the missing tags happen to be an issue. Then again, JannyAI and it's formatting is probably the cause of most of it, as v2 and v3 cards are basically the ones used.
Got a book to OCR, it's captured but not yet ready to deal with. We'll see.
Now i have cards from storychan. Storychan has cards, but no author list, and no kinks anywhere (other it's likely NSFW, or like one tag which says nothing and isn't worth the effort to manually fight with). There's nothing really to build. I also can't tell what cards are new from that site so i can't scour it for anything other than what i've already got.
So instead I'll try using Gemini to do the job. Apparently llama tools included a cli interface (far better than trying to netcat to an ip address and build a json query); So it's just a matter of building the command, loading a small 7B model, work, and save the results asking it to scan the text and give me a list of kinks to check and populate. Course there's like 570 of these cards, so this process will take a couple days to sift through the output.
One such card Bnuyy Family has the following from the LLM, giving me basically something to work with.