“ We've also heard claims that we trained on test sets -- that's simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations.”
If the developer’s Christmas bonus depends on scoring high on a particular benchmark, it is not inconceivable that benchmark somehow would make its way into the training set - directly or indirectly.
I don’t think the management would have to directly encourage it as stated in the Chinese text.
“ We've also heard claims that we trained on test sets -- that's simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations.”
I don’t think the management would have to directly encourage it as stated in the Chinese text.
Deleted Comment