Description


Through the Looking Glass Sentiment Analysis with tidytext includes the R code and text analysis for Through the Looking Glass (http://www.gutenberg.org/ebooks/12).

Notes taken from Text Mining with R A tidy approach by Julia Silge and David Robinson (https://www.tidytextmining.com/).

The main R packages that will be explored are tidytext, dplyr, tidyr and other tidy tools.

The gutenbergr package provides the text data.

This RNotebook was created using the Free Software R and Rstudio. Free software is vital in protecting the freedoms of users and creators.

Copyright (C) 2018 Crista Moreno

This program is free software: you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation, either version 3 of
the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

Packages


library(dplyr) # for data manipulation
#install.packages("tidytext")
library(tidytext)
library(magrittr) # for piping commands
library(ggplot2)
library(tidyr)
# install.packages("janeaustenr")
library(janeaustenr)
library(stringr)
# install.packages("gutenbergr")
library(gutenbergr)
library(scales)
library(wordcloud)
#install.packages("reshape2")
library(reshape2)

Lexicons


The three general-purpose lexicons are:

Definition of the word lexicon from www.dictionary.com

A wordbook or dictionary, especially of Greek, Latin, or Hebrew. The vocabulary of a particular language, field, social class, person, etc. inventory or record.

glass


Download Through the Looking Glass from Project Gutenberg

# download glass from Project Gutenberg
glass <- gutenberg_download(c(12))
glass

Tokenize the text from Through the Looking Glass

# tokenize the text
tidy_glass <- glass %>% unnest_tokens(word, text)
# display the tokenization
tidy_glass

Remove stopwords from Through the Looking Glass

# remove the stopwords from glass
tidy_glass %<>% anti_join(stop_words)
## Joining, by = "word"
# display the results
tidy_glass

Remove underscores from words

# remove the underscores from words
tidy_glass %<>% mutate(word = str_extract(word, "[a-z']+"))
# display the results
tidy_glass

Word Frequency Table

# display a table of word frequencies 
tidy_glass %>% count(word, sort=TRUE)

Through the Looking Glass Word Frequency


tidy_glass %>% 
  count(word, sort=TRUE) %>% 
  na.omit() %>%
  filter(n > 30) %>% 
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n)) +
  geom_col(color = "white", fill="red") +
  xlab(NULL) +
  coord_flip()

glass Word Cloud


tidy_glass %>% 
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))

Word Cloud Sentiments


tidy_glass %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill=0) %>%
  comparison.cloud(colors = c("black", "red"), max.words = 100)