You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Curious to know some more details about FIM and its effect on the pre-trained model.
Here's a paragraph from the SantaCoder paper:
FIM for cheap
We observe a minor drop in performance of the FIM model compared to the No-FIM model. Specifically, we see that the pass@100 performance of the FIM model is 2-4% lower on HumanEval and 1% lower on MBPP. While Bavarian et al. (2022) presented evidence for the existence of a FIM-for-free property (i.e., arguing that autoregressive models can be trained with FIM without harming left-to-right capabilities), we do find a small but consistent drop of FIM models on left-to-right text2code benchmarks.
Was a similar analysis carried out on StarCoder?
Was StarCoder pre-trained on a 50-50 split between FIM and next-token data? (as indicated in this Megatron script)
The text was updated successfully, but these errors were encountered:
Hello, we didn't perform the ablation for StarCoder given the amount of compute it requires for training, but you can check the CodeLLama paper where the authors observed similar behavior at different scales.
Hello, we didn't perform the ablation for StarCoder given the amount of compute it requires for training, but you can check the CodeLLama paper where the authors observed similar behavior at different scales.
Regarding FIM percentage, we used 50%.
i have a question, as the known ratio, many eval ratios drop because of fim under pretrain stage, why you still use fim with 50% percentage?
Hi!
Curious to know some more details about FIM and its effect on the pre-trained model.
Here's a paragraph from the SantaCoder paper:
The text was updated successfully, but these errors were encountered: