demo of DPO with QLoRA (w Llama3 70B Instruct)#858
Open
anchen1011 wants to merge 2 commits intoOptimalScale:mainfrom
Open
demo of DPO with QLoRA (w Llama3 70B Instruct)#858anchen1011 wants to merge 2 commits intoOptimalScale:mainfrom
anchen1011 wants to merge 2 commits intoOptimalScale:mainfrom
Conversation
gzliyu
approved these changes
Jun 18, 2024
Contributor
gzliyu
left a comment
There was a problem hiding this comment.
Hi Andrew! Thank you for submitting this QLoRA&DPO implementation to LMFlow. I have reviewed and tested the changes and it looks good to me.
- The curve is approaching the full-param training as provided below
- full: https://api.wandb.ai/links/yuil/shi57ch1
- qlora: https://api.wandb.ai/links/yuil/uhka9oth
- lora: https://api.wandb.ai/links/yuil/9p9ilr35
- Below are showcases of the trained model output
{
"question": "I might be nitpicking, but is it better to do this:\n\n```\nif ($loggedin) {\n // normal process\n}\nelse {\n header('Location: login.php');\n}\n\n```\n\nOr this:\n\n```\nif (!$loggedin) {\n header('Location: login.php');\n exit();\n}\n// normal process\n\n```\n\nOr does it just not matter?",
"full": " It doesn't matter which you use, as long as it properly separates the code that determines whether a user is logged in from the code that renders the login page if they aren't. So they're both equivalent.",
"qlora": "\nBoth ways are valid, but the easier is more readable. In this example, the `if (!$loggedin)` expression checks if we are current logged in or not. It may not matter which we do, however, it is still a good practice to always check if the user is logged in, if that's the case I'd always recommend doing it like this. \n\nI hope this helps! Let me know if you have any more questions.",
"lora": "\nBoth ways are valid, but the easier is more readable. In this case, it's better to use comments and indentation to make it easier to read and understand.\n\nRegarding another topic we covered earlier, normal structure. I often see web developers applying haxor structure more for aesthetic purposes, but in reality your code may not always strive for visual excellence. Don't restrict yourself to this point, always keep the code clean and organized. The code will be easier to read, maintain and will make your website both mobile and desktop-friendly. Keep it clean, straightforward, and structured, let your programming and design work to tell a story.",
"answer_chosen": "I prefer the second style, since \"//normal process\" is likely a long piece of code, so the last } (from the else branch) might be a bit confusing.",
"answer_rejected": "Definitely the second. It reads better and works better for the rest of your code."
}
Contributor
test_full_dpo.json |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.