“The Heart of the Matter” Topic-Modeled (A Preliminary Experiment)

The Heart of the Matter (report cover)The American Academy of Arts & Science’s Commission on the Humanities and Social Sciences issued its important “The Heart of the Matter” report to the U.S. Congress and the American public in June 2013.  4Humanities has been collecting such public statements about the humanities from around the world for analysis in its WhatEvery1Says corpus–statements that also include white papers, editorials, opinion pieces, blog posts, letters to the editor, articles, etc. (WhatEvery1Says project) (WhatEvery1Says spreadsheet of documents collected so far)

Our goal is to use this corpus of discourse on the humanities as the basis for both traditional analysis (e.g., study sessions, focus groups, “close reading,” etc.) and new digital-humanities methods of study.  The latter include machine-assisted word frequency analysis, topic modeling, social network analysis, and visualization.  Our hypothesis is that digital methods can help us learn new things about how media pundits, politicians, business leaders, administrators, scholars, students, artists, and others are actually thinking about the humanities.  For example, are there sub-themes beneath the familiar dominant clichés and memes?  Are there hidden connections or mismatches between the “frames” (premises, metaphors, and narratives) of those arguing for and against the humanities?  How do different parts of the world or different kinds of speakers compare in the way they think about the humanities?  Instead of concentrating on set debates and well-worn arguments, can we exploit new approaches or surprising commonalities to advocate for the humanities in the 21st century?

4Humanities is well along in collecting a growing number of public statements about the humanities.  Now it is proceeding to the early stages of both traditional and digital-humanities analyses of this corpus.  The 4Humanities@UCSB local chapter (at the University of California, Santa Barbara) met on November 7, 2013, for a discussion workshop about “The Heart of the Matter” report, taking this document to be especially useful for study because the wide membership of the Commission that produced it was designed to balance between the perspectives of education, government, business, media, the arts, etc.  The UCSB local chapter has also engaged in early (at this stage, non-conclusive) digital-analytical experiments–including “topic modeling”– with “The Heart of the Matter” and other sample documents.

Network visualization of topics discussed in a small sample set of the WhatEvery1Says corpus (including "The Heart of the Matter" report)

Network visualization of topics discussed in a small sample set of the WhatEvery1Says corpus (including “The Heart of the Matter” report)

The following is a preliminary attempt at topic modeling the “Heart of the Matter” both by itself and in the context of a small subset of the larger 4Humanities WhatEvery1Says corpus.  (For explanations of the statistical-analysis method of topic modeling and its use by digital humanists on humanistic materials, see Appendix below). The purpose is to do a walk-through of the kinds of methodological problems that 4Humanities will need to address in topic modeling its whole corpus of materials on the humanities–problems that include the selection and categorization of samples; the extraction and pre-processing (“cleaning”) of documents; the creation of filters to make textual analysis more meaningful (e.g., through filtering out proper names, common words, etc.); the generation of visualizations; and the interpretation of results.

This early topic model  should not be interpreted as offering meaningful conclusions about patterns and trends in the way people now speak about the humanities.  At this point, it is only an experiment to provide proof-of-concept for the methodology.

 ❦

How We Created the Topic Model: (an early, incomplete rehearsal of the full process that will be needed to topic model the WhatEvery1Says corpus)

At this initial stage, the 4Humanities@UCSB local chapter of 4Humanities is topic modeling its corpus as follows.  In the future, the process will include more steps (and, hopefully, will be assisted by more automation in the text-preparation stage):

A.

  1. The text of “The Heart of the Matter” was extracted from its original PDF format; cleaned to remove endnotes, citations, diagrams, etc.; and corrected in regard to such issues as hyphens and formatting.
  2. Some plurals were made singular (“colleges,” “universities,” “educators,” “teachers”). Also, “social sciences” was converted to “socialsciences”; and “liberal arts” and “liberal studies” were converted to “liberaralarts” and “liberalstudies.”
  3. The text was “chunked” into the separate sections/chapters of the document.
  4. A “stop list” (words not to be considered in the computational processing of the document) was created and applied.
  5. The text was run through David Newman’s “Topic Modeling Tool”–a Java implementation of the MALLET tool for LDA (Latent Dirichlet Allocation) topic modeling. The tool was set for 400 iterations of each run. Runs were made with the number of topics set at 10, 20, 30, and 40. (Results A for the 10-topics and 20-topics models below.)

B.

  1. The same steps 1-4 were followed for another 13 sample documents from the WhatEvery1Says corpus.
  2. Then step 5 (topic modeling) was run for the whole 14-sample set of documents, including “The Heart of the Matter” report as the 14th document (treated as a whole document in this case). (Results B for the 10-topics and 20-topics models below.)
  3. Results B were also loaded into the Gephi software tool to create a network visualization metaphor of the way different documents in the sample link to the various “topics” of discussion. (See “‘The Heart of the Matter’ Visualized

C.

  1. In addition, the text of “The Heart of the Matter” was processed with other visual analysis tools–including Voyant (Voyeur) Tools and Many Eyes–to create alternative visual-analytic methods for studying this document.  (See “‘The Heart of the Matter’ Visualized”)


Results A: Topic modeling of “The Heart of the Matter” (HoM) report by itself (“chunked” into separate sections of its front matter, executive summary, introduction, and individual chapters)

Results A (HoM) 10-topic topic model
List of Topics

  1. research social national government science funding support federal scholars private

  2. public programs people state new resources schools corps important local

  3. humanities socialsciences community cultural scientific understand support libraries organizations program

  4. study new international critical language life global cultures other national

  5. public university work areas develop economy case training arts value

  6. nation disciplines sciences report future foundation century political sense past

  7. teacher teaching history school literacy understanding live participate stem high

  8. education world need knowledge students broad first business different time

  9. learning well provide american america opportunities general creation access states

  10. skills college americans higher online ability liberalarts today based offer

 

Results A (HoM) 20-topic topic model
List of Topics

  1. research support funding federal private scholars scientific humanistic challenges partnerships

  2. education college skills higher ability liberalarts open right domains early

  3. international national programs global study language cultures well government address

  4. digital important online lifelong general opportunities councils offer financial culture

  5. students american teaching understand basic access capable clear empathy approach

  6. americans broad today model innovative ways audiences curriculum leadership term

  7. education need understanding first century leaders own experience different resources

  8. university case fields human value areas just business connect years

  9. disciplines sense matter continue learned differences necessary discovery lead information

  10. nation citizens report future government political reading importance possible democracy

  11. socialsciences social humanities sciences society foundation broader arts strong interest

  12. teacher literacy school high educator share professional development full common

  13. life critical others competitiveness security world see makes united lives

  14. new learning develop economy local america expertise based role infrastructure

  15. knowledge world work time creativity promote wider educational great essential

  16. humanities cultural institutions organizations increase neh past recommends encourage communities

  17. public provide science people support needs long endowment instruction scientists

  18. state skills schools well programs resources corps libraries museums teach

  19. national level commitment generation subjects effort program agencies civics critical

  20. community history creation live states current reach given another young


Results B: Topic modeling of a 14-document subset of the WhatEvery1Says corpus (including “The Heart of the Matter” [HoM])

Results B 10-topic topic model
List of Topics

  1. humanities world arts society critical higher value experience matter real

  2. science human sciences nature wieseltier pinker moral explain scientism theories

  3. education national new socialsciences support skills nation college well programs

  4. research public social need knowledge students disciplines cultural people state

  5. world other life understanding scientific time know just intellectual political

  6. humanities university said something greater big technology studies called thinking

  7. year years great job works high person learn issues president

  8. art departments say empirical point example let majors department end

  9. digital history scholarship new liberal ways historical sciences report common

  10. humanities people way university students jobs english go think few

 

Results B 20-topic topic model
List of Topics

  1. understanding things point analysis religion same today old psychology forces

  2. humanities education students critical higher human thinking value stem academic

  3. research skills american government knowledge learning private people humanistic help

  4. college jobs departments students better parents majors department end less

  5. world other life time knowledge others own scientists general current

  6. digital scholarship arts liberal data grand harvest scholarly project projects

  7. said university president professors national washington endowment practical view foys

  8. greater university studies too never back least technical content shakespeare

  9. need social sciences cultural cultures intellectual disciplines important future first

  10. science human scientific moral nature scientism ideas mind principles music

  11. big politics person explain few go century either shaggy war

  12. wieseltier sciences pinker empirical meaning art say nothing theory religious

  13. people literature university way english think job says harvard want

  14. new know just field technology questions something scholars example means

  15. political past different year long great years reading liberalarts works

  16. humanities history ways work common historical form open information called

  17. education public socialsciences national support nation programs college new teacher

  18. well research matter people teaching experience culture sense together see

  19. study provide global understand language schools americans understanding challenges ability

  20. humanities arts society http real arguments crisis www success rather

 

 


Appendix: “Topic Modeling”:

For explanations of the statistical-analysis method of topic modeling and its use by digital humanists on humanistic materials, see the following:


6 Comments

Trackbacks

  1. 4Humanities@UCSB Meeting (Nov. 7, 2013) | 4Humanities
  2. 4Humanities@UCSB Meeting (Nov. 21, 2013) | 4Humanities
  3. The Humanities Produce Knowledge | Lindsay Thomas
  4. Advocating 4humanities @Union College, Schenectady, NY, February 10, 2014 | Katie Faull
  5. 4Humanities@UCSB Meeting – “What Every1Says” Project (continued) (Feb. 18, 2014) | 4Humanities
  6. Topic Modeling and Gephi: A Work in Progress : Digital Environmental Humanities

Leave a Response