My reading list and take on weak to strong generalization with respect to automating alignment research. Written during AI Safety Camp
A Review of Weak to Strong Generalization
My reading list and take on weak to strong generalization with respect to automating alignment research. Written during AI Safety Camp