More

randkyp · on Aug 25, 2024

It's a trademark thing; there are some prerequisites and manual approval involved before you get to officially name your distro "[Distro Name] Linux".

https://www.linuxfoundation.org/legal/the-linux-mark

kragen · on Aug 27, 2024

that's not much of an excuse for pretending to have written an os; the only real requirement in https://www.linuxfoundation.org/legal/sublicense-agreement is that you acknowledge that linus torvalds owns the linux trademark

randkyp · on July 13, 2024

Heh, funny you mention that, considering Be's pivot to BeIA. Some Be engineers also worked on the (unreleased) Palm OS Cobalt, and eventually, Android. (And then Fuschia, but I don't think that OS will ever hit smartphones.)

randkyp · on July 13, 2024

Haiku does support WLAN adapters (even USB ones), through the support isn't as extensive as Linux or the BSDs. You might want to use the current nightly builds instead of the latest beta version, though, which was released in December 2022.

https://hardware.besly.de/index.php?hardware=Wireless_Networ... https://discuss.haiku-os.org/t/support-for-realtek-usb-wifi-...

randkyp · on March 31, 2024

Does this do RAG over the character's chat history too? That's something SillyTavern can also do with extensions, but I figured since your project already uses Llamaindex, this feature can be something that's already baked in from the get-go.

_akhe · on March 31, 2024

Yep it can do CoT for ongoing conversations or to get to the bottom of something through back-and-forth. And you nailed it regarding llamaindex, they provide framework options: https://docs.llamaindex.ai/en/latest/examples/chat_engine/ch... (perfect for HN with the Paul Graham example!)

They even dabble in custom personalities with prompt mixins (example: You can chat with a PDF that will respond like Shakespeare), and if this part was more robust I would delegate to it instead of what I created with ragdoll's prompt prefixes. Turns out the hard part is not converting third-person to first-person. For ragdoll, the heavy lifting is more in the configuration and management of different personas, its multi-modality (of models), the Node & React libraries so that developers can use them in realistic applications... where the value llamaindex brings is its incredible indexing capabilities combined with a conversational query engine (why I chose llamaindex over langchain for this). Ragdoll picks up where llamaindex leaves off regarding personas.

I love that SillyTavern says on their GitHub README: "On its own Tavern is useless, as it's just a user interface. You have to have access to an AI system backend that can act as the roleplay character." I want to avoid being a thin wrapper, and instead have that roleplay character aspect be central to what ragdoll does, so that it can be the de facto creative studio for any character-focused creative deliverable: A story, a film, music, games - so that a user can literally create films and music (and more) in this app like some kind of super Photoshop. I think to accomplish that, it cannot simply be a thin wrapper around an open model. It has to bring as much to the table as an ultra fine-tuned model would yet in seconds instead of years, and with the app- and community-level functionality needed (including being a free-to-use creator tool) to get people to actually build things with it.

_akhe · on April 1, 2024

For anyone curious llamaindex's "prompt mixins", they're actually dead simple: https://github.com/run-llama/llama_index/blob/8a8324008764a7... - and maybe no longer supported.

I basically reinvented this wheel in ragdoll but made it more dynamic: https://github.com/bennyschmidt/ragdoll/blob/master/src/util...

loa_in_ · on April 1, 2024

I don't think a tavern is a place one goes to to make movies

_akhe · on April 1, 2024

Not yet haha but even as a place to hang out and casually chat, it would be cool if the character occasionally rendered a cutscene to go along with narratives, or you could optionally enable music and sfx like an audiobook. Maybe the most interesting ones you could export (and distribute for others to experience).

Though I bet the transition from AI text chat to rich multimedia will be like silent films to talkies - where some characters just aren't as interesting with a voiceover or depicted in a video. For some types of characters (written storytellers, etc.) the best interactions might always be text-based.

I felt this with the Final Fantasy 7 Remake, though it's clearly improved from the 1997 version, something felt lost in the transition from the old pre-rendered scenes (drawings) and having to read the dialog in your head, to now having high-quality voiceovers in the best 3D scenes. Yet, if you take a Metal Gear Solid or a Madden - the richer the experience the better.

Ideally: You start out just wanting to go to the tavern and chat with a group of characters, but that interaction became so unexpectedly rich and entertaining you want to capture it, so you can watch it again or share it.

loa_in_ · on April 2, 2024

I see. That's an enchanting vision indeed! And with other human players in the same space it sounds unendingly wonderful.

randkyp · on March 29, 2024

This is HN, so I'm surprised that no one in the comments section has run this locally. :)

Following the instructions in their repo (and moving the checkpoints/ and resources/ folder into the "nested" openvoice subfolder), I managed to get the Gradio demo running. Simple enough.

It appears to be quicker than XTTS2 on my machine (RTX 3090), and utilizes approximately 1.5GB of VRAM. The Gradio demo is limited to 200 characters, perhaps for resource usage concerns, but it seems to run at around 8x realtime (8 seconds of speech for about 1 second of processing time.)

EDIT: patched the Gradio demo for longer text; it's way faster than that. One minute of speech only took ~4 seconds to render. Default voice sample, reading this very comment: https://voca.ro/18JIHDs4vI1v I had to write out acronyms -- XTTS2 to "ex tee tee ess two", for example.

The voice clarity is better than XTTS2, too, but the speech can sound a bit stilted and, well, robotic/TTS-esque compared to it. The cloning consistency is definitely a step above XTTS2 in my experience -- XTTS2 would sometimes have random pitch shifts or plosives/babble in the middle of speech.

bambax · on March 29, 2024

I am trying to run it locally but it doesn't quite work for me.

I was able to run the demos allright, but when trying to use another reference speaker (in demo_part1), the result doesn't sound at all like the source (it's just a random male voice).

I'm also trying to produce French output, using a reference audio file in French for the base speaker, and a text in French. This triggers an error in api.py line 75 that the source language is not accepted.

Indeed, in api.py line 45 the only two source languages allowed are English and Chineese; simply adding French to language_marks in api.py line 43 avoids errors but produces a weird/unintelligible result with a super heavy English accent and pronunciation.

I guess one would need to generate source_se again, and probably mess with config.json and checkpoint.pth as well, but I could not find instructions on how to do this...?

Edit -- tried again on https://app.myshell.ai/ The result sounds French alright, but still nothing like the original reference. It would be absolutely impossible to confuse one with the other, even for someone who didn't know the person very well.

randkyp · on March 29, 2024

I played with it some more and I have to agree. For actual voice _cloning_, XTTS2 sounds much, much closer to the original speaker. But the resulting output is also much more unpredictable and sometimes downright glitchy compared to OpenVoice. XTTS2 also tries to "act out" the implied emotion/tone/pitch/cadence in the input text, for better or worse.

But my use case is just to have a nice-sounding local TTS engine, and current text-to-phoneme conversion quirks aside, OpenVoice seems promising. It's fast, too.

echelon · on March 29, 2024

And StyleTTS2 generalizes out of domain even better than that.

dragonwriter · on March 31, 2024

> but when trying to use another reference speaker (in demo_part1), the result doesn’t sound at all like the source

I’ve noticed the same thing and I wonder if there is maybe some undocumented information about what makes a good voice sample for cloning, perhaps in terms of what you might call “phonemic inventory”. The reference sample seems really dense.

> Indeed, in api.py line 45 the only two source languages allowed are English and Chinese

If you look at the code, outside of what the model does it relies on the surrounding infrastructure converting the input text to the international phonetic alphabet (IPA) as part of the process, and only has that implemented for English and Mandarin (though cleaners.py has broken references to routines for Japanese and Korean.

causi · on March 29, 2024

We're so close to me being able to open a program, feed in an epub, and get a near-human level audiobook out of it. I'm so excited.

aedocw · on March 29, 2024

Give https://github.com/aedocw/epub2tts a look, the latest update enables use of MS Edge cloud-based TTS so you don't need a local GPU and the quality is excellent.

causi · on April 1, 2024

Interesting. Seems like a pain to get running but I'll give it a shot. Thanks.

jurimasa · on March 29, 2024

I think this is creepy and dangerous as fuck. Not worth the trouble it will be.

_zoltan_ · on March 29, 2024

you're gonna be REALLY surprised out there in the real world.

CamperBob2 · on March 29, 2024

Other sites beckon.

aftbit · on March 29, 2024

I want to try chaining XTTS2 with something like RVCProject. The idea is to generate the speech in one step, then clone a voice in the audio domain in a second step.

fellowniusmonk · on March 31, 2024

I'm running it locally on my M1. The reference voices sound great, trying to clone my own voice it doesn't sound remotely like me.

epiccoleman · on March 29, 2024

I have got to build or buy a new computer capable of playing with all this cool shit. I built my last "gaming" PC in 2016, so its hardware isn't really ideal for AI shenanigans, and my Macbook for work is an increasingly crusty 2019 model, so that's out too.

Yeah, I could rent time on a server, but that's not as cool as just having a box in my house that I could use to play with local models. Feels like I'm missing a wave of fun stuff to experiment with, but hardware is expensive!

sangnoir · on March 29, 2024

> its hardware isn't really ideal for AI shenanigans

FWIW, I was in the same boat as you and decided to start cheap, old game machines can handle AI shenanigans just fine wirh the right GPU. I use a 2017 workstation (Zen1) and an Nvidia P40 from around the same time, which can be had for <$200 on ebay/Amazon. The P40 has 24GB VRAM, which is more than enough for a good chunk of quantized LLMs or diffusion models, and is in the same perf ballpark as the free Colab tensor hardware.

If you're just dipping your toes without committing, I'd recommend that route. The P40 is a data center card and expects higher airflow than desktop GPUs, so you probably have to buy a "blow kit" or 3D-print a fan shroud and ensure they fit inside your case. This will be another $30-$50. The bigger the fan, the quieter it can run. If you already have a high-end gamer PC/workstation from 2016, you can dive into local AI for $250 all-in.

Edit: didn't realize how cheap P40s now are! I bought mine a while back.

beardedwizard · on March 29, 2024

I would love a recommendation for an off the shelf "gpu server" good for most of this that I can run at home.

macrolime · on March 29, 2024

Mac Studio or macbook pro if you want to run the larger models. Otherwise just a gaming pc with an rtx 4090 or a used rtx 3090 if you want something cheaper. A used dual 3090 can also be a good deal, but that is more in the build it yourself category than off the shelf.

pksebben · on March 29, 2024

I went the 4090 route myself recently, and I feel like all should be warned - memory is a major bottleneck. For a lot of tasks, folks may get more mileage out of multiple 3090s if they can get them set up to run parallel.

Still waiting on being able to afford the next 4090 + egpu case et al. There are a lot of things this rig struggles with running OOM, even on inference with some of the more recent SD models.

ckl1810 · on March 29, 2024

Depending on what models you want to run, RTX 4090 or RTX 3090 may not be enough.

Grok-1 was running on a M2 Ultra with 196GB of ram.

https://twitter.com/ibab_ml/status/1771340692364943750

101008 · on March 29, 2024

Sorry if this is a silly question - I was never a Mac user, but I quick googled Mac Studio and it seems it's just the computer. Can I plug it to any monitor / use any keyboard and mouse, or do I need to use everything from Apple with it?

macrolime · on March 29, 2024

You can, but with some caveats. Not all screen resolutions work well with MacOS, though using BetterDisplay it will still usually work. If you want touch id, it's better to get the Magic Keyboard with touch id.

timschmidt · on March 29, 2024

Any monitor and keyboard will work, however Apple keyboards have a couple extra keys not present on Windows keyboards so require some key remapping to allow access to all typical shortcut key combinations.

spectre3d · on March 29, 2024

Mainly to swap the Windows and Alt keys, which you can do in System Settings without any additional software.

If you use a mouse with more than right-click and scroll wheel, with side buttons for example, then you’ll need extra software.

lakomen · on March 29, 2024

I'm clueless about AI, but here's a benchmark list https://www.videocardbenchmark.net/high_end_gpus.html

Imo the 4070 super is the best value and consumes the least amount of Watts, 220 in all the top 10.

So anything with one and some ECC RAM aka AMD should be fine. Intel non-xeons need the expensive w680 boards and very specific RAM per board.

ECC because you wrote server. We're professionals here after all, right?

vbi8iBEX · on March 31, 2024

I have a 2080s and build my ai software for it and above. 4090 is a good purchase

antonvs · on March 29, 2024

What if I enjoy gambling with cosmic ray bitflips?

GTP · on March 29, 2024

Maybe they would make your AI model evolve into an AGI over time :D

batch12 · on March 29, 2024

So I went really cheap and got a Thunderbolt dock for a gpu and a secondhand Intel nuc that supported it. So far it has met my needs.

lardo · on March 29, 2024

CivitAI has one https://civitai.com/builds

holtkam2 · on March 29, 2024

I'm in exactly the same boat. Yeah ofc you can run LMs on cloud servers but my dream project would be to construct a new gaming PC (mine is too old) and serve a LM on it, then serve an AI agent app which I can talk to from anywhere.

Has anyone had luck buying used GPUs, or is that something I should avoid?

ssl-3 · on March 29, 2024

I bought some used GPUs during the last mining thing. They all worked fine except for some oddball Dell models that the seller was obviously trying to fix a problem on (and they took them back without question, even paying return shipping).

And old mining GPUs are A-OK, generally: Despite warnings from the peanut gallery for over over a decade that mining ruins video cards, this has never really been the case. Profitable miners have always tended to treat these things very carefully, undervolt (and often, underclock) them, and pay attention to them so they could be run as cool and inexpensively as possible. Killing cards is bad for profits, so they aimed towards keeping them alive.

GPUs that were used for gaming are also OK, usually. They'll have fewer hours of hard[er] work on them, but will have more thermal cycles as gaming tends to be much more intermittent than continuous mining is.

The usual caveats apply as when buying anything else (used, "new", or whatever) from randos on teh Interwebz. (And fans eventually die, and so do thermal interfaces (pads and thermal compound), but those are all easily replaceable by anyone with a small toolkit and half a brain worth of wit.)

zoklet-enjoyer · on March 29, 2024

I forgot all about Vocaroo!

randkyp · on March 22, 2024

I found this recent thread interesting, specifically about really considering whether you're going to read the data you just wrote in the near future or not (in which case, use direct IO) and a set of (abandoned?) patches for write-behind caching for sequential writes in Linux (https://lore.kernel.org/lkml/156896493723.4334.1334048120714...).

https://lore.kernel.org/linux-mm/45odvhgymm7fxsgwpewoiiggaok...

rwmj · on March 22, 2024

Direct IO is very inflexible. A better way is this recommended by Linus in an old post:

https://lkml.iu.edu/hypermail/linux/kernel/1005.2/01845.html

I implemented something along the same lines but a bit less spicy here:

https://gitlab.com/nbdkit/nbdkit/-/commit/aa5a2183a6d16afd91...

rwmj · on March 22, 2024

That second link is wrong, should be: https://gitlab.com/nbdkit/nbdkit/-/commit/a956e2e75d6c88eeef...

randkyp · on July 3, 2023

The unlock tool would only work if it successfully authenticates with Xiaomi's server with matching Mi Cloud ID as the one previously registered to the device. So I very much doubt that it is stolen.

randkyp · on Oct 22, 2022

The image transcription in the alt-text is a nice touch.

randkyp · on Nov 29, 2021

"A search is either incremental or excremental."

- Jef Raskin

randkyp · on Jan 24, 2021

Glad to see that noise reduction is on the roadmap! Does Filmulator support embedded lens profiles? I enjoyed using Filmulator btw, its default output are just lovely.

CarVac · on Jan 24, 2021

It doesn't support embedded ones currently, only Lensfun.

But someone is working on adding that to darktable for Sony at least, and once that's more stable I might yoink that code.