Mellanie Lemman (sp? could be way off)
Issues
- Data contamination …
- Mentions that development psychologists tests on human children can be used for benchmarks for cognitive capacity of systems
Six Principles for more rigorous
- Own bias towards anthromorphization
- Eliza effect?
- Be skeptical of others’ (and your own) hypothesis
- Design control experiments for possible alternate strategies (memorization? shortcuts?…)
- Clever Hans?
- Clever hans was hailed as the first and most famous thinking animal.
- Clever hans wasn’t able to answer the question if the questioner did not know the answer.
- Six-to-ten month old infants choose the helper over the hinderer
- Conclusion is young infants judge others on social behaviors
- There was a bounc on the top of the hill for the helper and so they did another experiment where they did the bounce at the bottom of the hill.
- The result was without the bounce there was no statistical significance and with the bounce, the young infants choose whichever entity was bouncing.
- Analyze failure types - these give more insight than success! And embrace “negative” results
- Psychology call these killjoy explanations
- There exists a journal of a negative results
- Design novel variations in stimuli to test robustness and generalization
- Consider performance vs. competence
- Does the system possess the capacity under study but cannot demonstrate it due to unfair task requirements
- Abstraction and reasoning corpus - 1k manually created tasks based on “core-knowledge” priors: objectness…
- “Investigating Abstraction…” paper by her, Exhibit Hall F? 2:30pm start