Through the Looking Glass Sentiment Analysis with tidytext includes the R code and text analysis for Through the Looking Glass (http://www.gutenberg.org/ebooks/12).
Notes taken from Text Mining with R A tidy approach by Julia Silge and David Robinson (https://www.tidytextmining.com/).
The main R packages that will be explored are tidytext
, dplyr
, tidyr
and other tidy tools.
The gutenbergr
package provides the text data.
This RNotebook was created using the Free Software R and Rstudio. Free software is vital in protecting the freedoms of users and creators.
Copyright (C) 2018 Crista Moreno
This program is free software: you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation, either version 3 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
library(dplyr) # for data manipulation
#install.packages("tidytext")
library(tidytext)
library(magrittr) # for piping commands
library(ggplot2)
library(tidyr)
# install.packages("janeaustenr")
library(janeaustenr)
library(stringr)
# install.packages("gutenbergr")
library(gutenbergr)
library(scales)
library(wordcloud)
#install.packages("reshape2")
library(reshape2)
The three general-purpose lexicons are:
AFINN from Finn Arup Nielsen (http://bit.ly/2s50F5w)
Bing from Bing Liu and collaborators (http://bit.ly/2s4B254)
NRC from Saif Mohammad and Peter Turney (http://bit.ly/2s4B8ts)
Definition of the word lexicon from www.dictionary.com
A wordbook or dictionary, especially of Greek, Latin, or Hebrew. The vocabulary of a particular language, field, social class, person, etc. inventory or record.
# download glass from Project Gutenberg
glass <- gutenberg_download(c(12))
glass
# tokenize the text
tidy_glass <- glass %>% unnest_tokens(word, text)
# display the tokenization
tidy_glass
# remove the stopwords from glass
tidy_glass %<>% anti_join(stop_words)
## Joining, by = "word"
# display the results
tidy_glass
# remove the underscores from words
tidy_glass %<>% mutate(word = str_extract(word, "[a-z']+"))
# display the results
tidy_glass
# display a table of word frequencies
tidy_glass %>% count(word, sort=TRUE)
tidy_glass %>%
count(word, sort=TRUE) %>%
na.omit() %>%
filter(n > 30) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n)) +
geom_col(color = "white", fill="red") +
xlab(NULL) +
coord_flip()
tidy_glass %>%
count(word) %>%
with(wordcloud(word, n, max.words = 100))
tidy_glass %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
acast(word ~ sentiment, value.var = "n", fill=0) %>%
comparison.cloud(colors = c("black", "red"), max.words = 100)