$3+

LLM Scribe - Handwrite LLM Multi-Format Datasets for Fine-Tuning

I want this!

LLM Scribe - Handwrite LLM Multi-Format Datasets for Fine-Tuning

$3+

🔍 What is LLM Scribe?

LLM Scribe is your professional toolkit for creating high-quality conversational datasets for Large Language Model fine-tuning. Whether you're a creative writer crafting character personalities or a developer preparing training data, LLM Scribe eliminates the technical barriers and formatting headaches.

No more struggling with JSON syntax or format specifications - LLM Scribe handles all the technical details while you focus on creating valuable content.

đź“·
Video Demo


✨ Key Features

Streamlined Dataset Creation

•Intuitive Interface - Focus on writing, not formatting

•Auto-save Functionality - Never lose your work with automatic saving on every interaction

•Progress Tracking - Set goals and monitor your dataset completion

•Tab Navigation - Rapidly cycle between fields for efficient data entry

•Light mode and Dark mode themes - Swap in settings

Professional Export Options

•Multiple Export Formats - Supports all major LLM training formats including ChatML, Alpaca, ShareGPT/Vicuna

•Format-Specific Customizations - Tailor your datasets with format-specific options

Advanced Capabilities

•Real-time Token Tracking - Monitor token usage with popular tokenizers (OpenAI, HuggingFace, Mistral)

•Customizable Fields - Enable/disable optional fields based on your specific needs

•System Message Support - Add system prompts for ChatGPT/ChatML formats

•Custom IDs - Assign unique identifiers for ShareGPT/Vicuna formats

Workflow Optimization

•Easy Dataset Reloading - Seamlessly continue work on existing projects

•Multi-turn Conversation Support - Create contextually aware training data

•In-app Guidance - Helpful tooltips and explanations throughout the interface


đź“‹ Supported Export Formats

Pair Data Exports

•chatgpt_chatml.jsonl

•chatml.json

•alpaca.jsonl

•alpaca.json

•sharegpt_vicuna.jsonl

•sharegpt_vicuna.json

•generic.jsonl

Multi-turn Data Exports

•chatgpt_chatml.jsonl

•chatml.json

•sharegpt_vicuna.jsonl

•sharegpt_vicuna.json

•Plus all pair formats (automatically generated)


🎓 Perfect for Both Beginners & Experts

For Beginners

•Start with default settings to get all formats you need

•Choose between simple pair data or more advanced multi-turn conversations

•No technical knowledge required - just write and export

For Experienced Developers

•Fine-tune your datasets with format-specific customizations

•Track token usage for cost and performance optimization

•Leverage advanced features for professional dataset creation


đź”’ Commercial License Included

•Full Commercial Rights to all datasets and outputs you create


Windows Compatibility

•Windows Only Application - Not compatible with macOS or Linux

⚠️ Security Notice

•No Code Signing Certificate due to being an indie dev - You may receive a security warning when installing. This is normal as the application is not code-signed. To install, click "More info" and then "Run anyway" when prompted.


📚 Resources

Support & Contact

Need help or have questions? Contact Gabriella@Kryptive.com for:

•Technical support

•Bug reports

•Additional format requests

•Tokenizer library additions

Patent & Technology Licensing

Interested in integrating this technology into your own products? Contact us for licensing the underlying system and methodology.


Version 1.0 | Patent Pending

Note on Tokenizer Libraries: LLM Scribe utilizes open-source libraries (tiktoken, Hugging Face transformers, Mistral AI Tokenizers) for token counting functionalities, each governed by their respective licenses.

$
I want this!
Size
96.2 MB