Mellanie Lemman (sp? could be way off)

Issues

  • Data contamination …
  • Mentions that development psychologists tests on human children can be used for benchmarks for cognitive capacity of systems

Six Principles for more rigorous

  1. Own bias towards anthromorphization
    1. Eliza effect?
  2. Be skeptical of others’ (and your own) hypothesis
    1. Design control experiments for possible alternate strategies (memorization? shortcuts?…)
    2. Clever Hans?
      1. Clever hans was hailed as the first and most famous thinking animal.
      2. Clever hans wasn’t able to answer the question if the questioner did not know the answer.
    3. Six-to-ten month old infants choose the helper over the hinderer
      1. Conclusion is young infants judge others on social behaviors
      2. There was a bounc on the top of the hill for the helper and so they did another experiment where they did the bounce at the bottom of the hill.
      3. The result was without the bounce there was no statistical significance and with the bounce, the young infants choose whichever entity was bouncing.
  3. Analyze failure types - these give more insight than success! And embrace “negative” results
    1. Psychology call these killjoy explanations
    2. There exists a journal of a negative results
  4. Design novel variations in stimuli to test robustness and generalization
  5. Consider performance vs. competence
    1. Does the system possess the capacity under study but cannot demonstrate it due to unfair task requirements
    2. Abstraction and reasoning corpus - 1k manually created tasks based on “core-knowledge” priors: objectness
    3. “Investigating Abstraction…” paper by her, Exhibit Hall F? 2:30pm start