• Staff Applications are OPEN! [ Staff / Moderator ] More Info HERE Help us make a better forum for everyone!
yano2mch
Reaction score
2,190
Reactions
2,194 1 4

Profile posts Latest activity Postings About Post areas

  • Trying out AI/Agentic Agents. Created a story-writing one to help. While i got it to write in a way that doesn't look robotic, the output tends to be as with a lot of things, more outlines and less a final product requiring three or four iterations and expansions before a said chapter is ready.

    Still it's interesting.
    Unrelated, but been collecting charcards for LLMs. My main two sources i've found workable are Character Tavern and TeleAI, TeleAI which is an offshoot of Character Hub. TeleAI and CharHub has a problem where the png file it offers often is an older and usually anemic version of the card, usually first draft. In cases the intro and description is empty or sometimes a single letter.

    So a large portion of my collected cards i'm re-downloading which will use a script to take the json file incorporated in the page to update the charcard.

    As for JannyAI, doesn't include tags for downloaded cards, and storychan doesn't have metadata like tags or author or anything.

    Would be so much easier if they had a public static instance of the databases of cards, making offline instances of these archives available; Kinda too bad chararchive is gone. I'll likely update/compress the given torrent soon enough once it seems it won't be needed to be seeded anymore.
    Unrelated, been working on reducing eXoWin3x by removing the windows components, but restoring them if i needs to be used or seeded (making the original untouched zip). This is since each game/program has a copy of windows. I tried doing a 200 games block of under 100Mb and 12Gb became 4Gb. So there's a LOT of space to be saved.

    Hardest part was distilling down to only the most important files that had common compressed blocks. (multiple versions compressed of the same files result in different encodings which many are unlikely to be compatible).

    One example of a game i love is Castle of the Winds, which if you get it ready to run it may be about 24Mb, but this process reduces it to about 2.3Mb (which is just the game, plus the delta to restore from a source). Curiously with covers i could easily do similarly to the book cover holding the entire book; at least for small games/programs. Larger ones become less savory/able.

    Alas this singular solution is unlikely to be useful elsewhere; Dos games won't have that many common files, and win95+ games will mount/use whole drives and likely COW (copy on write).
    yano2mch
    yano2mch
    Still experimenting, but squashed the whole Mineswepper down to like 18k. I'm not expecting much unless i have to host these files, and then the space saved for starting the eXoWin3x will make it worth the space savings it gives. Doubt old-games or other sites would care, even if it could save a ton of space. Compared to a single model on huggingface, this is literally loose change you find under the couch.
    Char Card update. That took entirely too long (over 8000 cards). Look up card, get tags, make file, strip unnecessary sections. Then write a script to uniformly format, remove the dead tags section and inject the new section. Along with finding uncensored versions of a number of cards.

    Since I've been using SillyTavern to manage things (it offers logs character management and even image generation features) I've found that some cards don't import even with the tags present. Curiously it seems like empty fields will cause the importer to barf and stop. So if you have a blank 'personality' field, it won't get to the tags. Removing useless fields seems to resolve that, or at least i hope so.

    Initial tests are promising. But finding a lot of v1 cards where most of the missing tags happen to be an issue. Then again, JannyAI and it's formatting is probably the cause of most of it, as v2 and v3 cards are basically the ones used.

    Got a book to OCR, it's captured but not yet ready to deal with. We'll see.
    yano2mch
    yano2mch
    Continuing this. I'm mostly satisfied i got the kinks worked out for nearly all cards.

    Now i have cards from storychan. Storychan has cards, but no author list, and no kinks anywhere (other it's likely NSFW, or like one tag which says nothing and isn't worth the effort to manually fight with). There's nothing really to build. I also can't tell what cards are new from that site so i can't scour it for anything other than what i've already got.

    So instead I'll try using Gemini to do the job. Apparently llama tools included a cli interface (far better than trying to netcat to an ip address and build a json query); So it's just a matter of building the command, loading a small 7B model, work, and save the results asking it to scan the text and give me a list of kinks to check and populate. Course there's like 570 of these cards, so this process will take a couple days to sift through the output.

    One such card Bnuyy Family has the following from the LLM, giving me basically something to work with.
    JSON:
    "tags": "Heat cycle, seduction, family drama, rape, dominance, submission, lesbian, MLM, action, adventure, erotic, fantasy, human/female, multiple love interests, aggressive heroine, incestuous undertones, family bonding",
    "genres": "Action, Adventure, Erotic",
    "time_period": "Future",
    "brand": "None",
    "pov": "Multiple",
    "characters": "{{user}}, Delilah, Cassandra, Candice",
    "race": "Human",
    "species": "None",
    Sorry for inactivity; The last week I've had likely a sinus infection that has made my tooth hurt like hell (and lack of dentists in the immediate area have prevented the tooth from being pulled); This has made it impossible to do anything but try and get past the pain (which i found doing 600mg Ibuprofen and 600mg Naproxen Sodium combo every 2-3 hours to make it bearable, yes not recommended long term). But now I'm likely past the worst of it. So maybe i can go back to projects.

    Side note, been trying to get my 10 books a day from z-lib, so I'll find ones with bad covers to replace, and then i may have a bunch of classics to add to some of the threads that are missing them.
    Well assuming the JSON files didn't accidentally get borked (say the metadata was at the beginning so the entire file was empty)... Found a bunch of duplicates in my charcards. Also wrote a script to make replacing the images easier, though that will take some testing.

    Yeah not really the best place to put updates on a geeky tool but it's something.
    This is interesting. To test Image Magick if it preserved charcards i resized a file, and found a Text field become a compressed one. It decoded fine, but answers that indeed the UU64/Base64 encoding is something charcards do and not png files do. compressed fields sound great but i don't think they are currently supported.

    At least it isn't thrown away :) Now i can resize charcards that are too big and then fix the compressed field to an uncompressed one after.
    Well, continuing on my png tool, extraction and building can result in the same file. So far so good.

    Also confirmed just the tEXt entry for chara or cvv3 is all that's needed to qualify as a characard.

    zTXt entries are a little confusing still, think it's implemented right. Though need to make it so i can use gzip files, as zopfli results in better compression when used. (Zopfli being just more brute forced version of zlib, saving 3-10%).

    Otherwise, the tool is.... basically ready. At least at an alpha testing stage. I intend to add a condensing option (reducing data, no longer a valid PNG file but compresses better) but only on data that can be rebuilt.

    Also likely using some variation of the tool you can shove an entire book into a png file and just make the png file an epub. I'll put one out as a test on a classic that's not very big.
    yano2mch
    yano2mch
    k, posted the image and it looks okay (couldn't be here...). making a custom BloB type while technically works barfed here but the preview looked fine, while just shoving it as a text type it was happy with.

    If anyone is interested, the zTXt field is zlib compressed, following the double null (and dropping the last 20 bytes of the file). Nothing special. If anything prepending a 8-10byte sequence to the zlib you could gzip decompress it.

    Injected a 7zip file and while it opened, it only saw 3 files, not exactly ready to use. Means png could technically be an archive, not that it would be recommended that way. This is far more prominent as obfuscation.

    oh well.... Learning loads while working on this little project.

    edit: Just verified the png is unaltered and book is present. Fun fun.

    edit2: More interesting fun. Wrapped the epub in 7zip (and heavily compressed it) and although i only saw 1 file, it was the epub. So this COULD work.
    While i'm working on the volume 2 for KLRXO, not getting the best sleep so it's hard to get in the mindset to work. So i apologize if this takes a while.

    It's not complex work, but there's a lot of pieces. I have the stories converted to all epub/html. the next step is to separate ones that did paragraphs right, and the ones that didn't. After that it's mostly wrap it in an epub, add covers and add little things to finish it off.
    Thank you very much for your time & that you where able to assist me with this.
    yano2mch
    yano2mch
    Did they work for you?
    RoarkeV
    RoarkeV
    Yes, thank you very much once again, it's a lot beter than it was.
    yano2mch
    yano2mch
    That's good. Honestly i was utterly surprised at the contents i saw. I suppose it's one way to do DRM, as in limiting more limited hardware from being able to handle it. That or it was a really really really poor encoding. Hard to say.

    Let me know if you need help with any others, i wouldn't mind :)
    I have a favor to ask please, can you "fix" 2 books for me? You will see when you open them what is wrong.
    yano2mch
    yano2mch
    sure, what's wrong with them and where are they at?

    Though I've seen once or twice where the book was unfixable... but that assumes the files were corrupted.
    RoarkeV
    RoarkeV
    I have forgotten how to add it? Any help please, I don't think it's available on any site's anymore.
    yano2mch
    yano2mch
    Attachments aren't really allowed on a profile post. Which is too bad, but also would make for tons of space used that would hardly ever be seen.


    I have a thread for fixing epubs. upload the books there.

    Downloading larger LLM models at 70B parameters. (mradermacher is my goto for models it seems)

    It may seem obvious; And yet I'm coming to a conclusion: Larger and better models, result in better output...

    Yes... Shocking i know...

    giphy.gif

    Good day i hope this finds you well. I want to know if this will work on any oceanpdf file or just certian one's?
    yano2mch
    yano2mch
    With xdeltas present, it's specifically for those 3 books, from a specific other thread.

    But since you are showing interest... Thinking of wrapping and converting what i have into a AHK or other tool, as i don't expect people to install a linux environment or CYGWIN in order to run my scripts (which is why i usually make them zip/batch files that are self-contained including everything you need).

    I certainly don't want to make a tool that works blindly without you knowing what happened, and as windows doesn't do diff/patch or any of that stuff, that means you're stuck with 'i hope it worked'. Which i don't like...

    There's a thread i started expecting this to at some point pick up. Let's resume there.

    Been playing with LLM's as anyone looking at the AI Generation page would know. Very interesting...

    If anyone can recommend a LLM-based commandline tool i can use. I suppose i can set up a server once i am more familiar and feed commands that way, but i was hoping for something super-lightweight so i can pass it pages to fix text as a pass before going my own OCR updates.
    Currently getting manga and doujin/hentai. mangapark is really decent, got a script to help me scrape specific ones i'm interested in since iron's manga only goes to like 2022, and madokami does before that. So stuff i'd been reading a few years ago aren't updated. (and nya only sometimes updates stuff...).

    Sorry if you were looking forward to the OCR stuff for Penny, i got the first book done but have to fix a few hanging quotes. 200+ page novels are a mite more annoying than 15 page mini stories are. But i hope the results are worth it.

    If anyone's interested i can make a tutorial on cleaning up manga (black and white) so it's cleaner and smaller, though it's a little technical. Otherwise i'll just do my own thing.
    Found a source for eroge games that has a lot of stuff i hadn't gotten yet. Anyways, went from 440 pages and down in the 80's, will take a few more days before my concentration goes to resuming OCR work and other things.

    Though for all i know, i might have to take a week off and do yardwork and spring cleaning too.
    FF v136 is so damn slow... if it hadn't disabled my extensions I'd have stuck with v106.
    Moved to v134 portable...
    Why can't they just leave things alone?
    JC Winchester is the next on the OCR list. Got it about half done, but not feeling like i'm in the mood to finish tonight.
    Update, working on a prototype and looks very promising. What's the most surprising is how few iterations i need to make on larger blocksizes to get some results. So unless i'm screwing up my code and what is happening, i'll be getting a prototype done fairly soon.

    But i must say, variable length and encoding and interleaving is a pain in the butt! Had to start over like 5 times.

    Though if the prototype works really well... Better not jinx it. First thing is first. This is though why i'm not as busy with my OCR jobs as i was, as i'm occupied on this project.
  • Loading…
  • Loading…
  • Loading…
  • Loading…
Back
Top Bottom