SENTIMENT ANALYSIS IN ANNUAL REPORTS FROM BRAZILIAN COMPANIES LISTED AT THE BM&FBOVESPA

MARCELO SANCHES PAGLIARUSSI; MARCELO OTONE AGUIAR; FERNANDO CAIO GALDI

Received: 14 March 2015

Accepted: 02 February 2016

Abstract: We investigated the association between the tone of annual reports issued by a sample of listed Brazilian firms and market variables (abnormal returns, trading volume and price volatility). The tone was measured using sentiment analysis techniques (Liu et al., 2005; Liu, 2010). As in Loughran and McDonald (2011), we developed and used lists of positive, negative, litigious, uncertainty-related and modal words in Portuguese to assess the tone of annual reports. Using a sample of 829 annual reports from 1997 to 2009, we observed a weak association between the tone of annual reports and stock market variables in Brazil. Additionally, we considered a sub-sample prior to GAAP changes in Brazil (1997-2007) and our results are maintained. Contrary to other studies using data from the United States, we found that the tone of annual reports released by Brazilian firms is not conducive to estimating returns.

Keywords: sentiment analysis, textual sentiment, positive words, negative words, annual reports.

Resumo: Foi investigada a associação entre o tom dos relatórios anuais divulgados por uma amostra de empresas brasileiras listadas na BM&FBovespa com variáveis de mercado (retornos anormais, volume de negociação e volatilidade). O tom dos relatórios foi medido por meio de técnicas de sentiment analysis (Liu et al., 2005; Liu, 2010). Seguindo o trabalho de Loughran e McDonald (2011), foram construídas listas de palavras positivas, negativas, litigiosas e de incerteza, além de modais, para construir uma medida de tom dos relatórios. A análise das 829 observações de relatórios, referentes ao período de 1997 a 2009, resultou na identificação de uma fraca associação entre as medidas de tom dos textos e as variáveis retorno anormal, volatilidade e volume anormal. Adicionalmente, realizou-se uma análise de robustez excluindo-se da amostra os anos de transição de regime contábil no Brasil (2008 e 2009), e nossos resultados se mantêm. Contrariamente aos estudos anteriores que usaram dados do mercado norte-americano, o tom dos relatórios divulgados pelas empresas brasileiras não contribui para melhorar as estimativas de retorno.

Palavras-chave: sentiment analysis, tom textual, palavras positivas, palavras negativas, relatórios anuais.

INTRODUCTION

Investors rely on several sources to gather the information they need to estimate the firm’s prospects. Efficient firms’ valuation should be equal to the present value of future cash flows conditional on investors information set, which includes quantitative and qualitative information (Tetlock et al., 2008).

A substantial body of research has focused on the impact of information that is quantitative in nature, such as accounting numbers, macroeconomic indicators, industry productivity and so on. More recently, research efforts have focused on understanding the influence of textual information (qualitative) on investment decisions. Such narrative information broadly consists of accounts of activities and actions taken by firms, such as asset dispositions, new product development/launch, cost reduction efforts, etc. Regulators usually expect that such narratives to “give the investor an opportunity to look at the company through the eyes of management by providing both a short and long-term analysis of the business of the company” (U. S. Securities and Exchange Commission, 2002).

However, our understanding of the specifics of how investors interpret descriptive information, and whether investors efficiently incorporate that information into prices is challenged by the difficulty in objectively quantifying such information (Jegadeesh and Wu, 2013). In response to such challenge, recent advances in statistical natural language processing have been introduced in the literature that uses content analysis to quantify the tone and content of descriptive information from corporate reports (see Kearney and Liu (2014) for a review of the textual sentiment literature in finance and Beattie (2014) for a discussion of the literature on accounting narratives).

A considerable amount of research has been carried out on different aspects of corporate communications, such as assessment of their readability (Li, 2008), use of impression management tactics (Tessarolo et al., 2010) and the harmony between text and numbers (Balata and Breton, 2005). There is also growing interest in the hypothetical impact of the tone of financial texts on stock market variables. Using purpose-built word lists, Loughran and McDonald (2011) observed associations between tone and abnormal returns, trading volume, return volatility, fraud, material weakness, and unexpected earnings. Tetlock (2007) documented the association of the tone of a popular Wall Street Journal column with stock returns and trading volume. Antweiler and Frank (2004) also observed that stock messages posted on Yahoo! Finance and Raging Bull helped predict market volatility.

These previous studies have presented evidence obtained using reports written in English. There is a lack of research about the information content of linguistic tone for emerging markets, especially in Portuguese. Our study investigate if quantifying language in Brazilian capital markets provides useful information about firms’ prospects and thus is incorporated into investors’ expectations with influence in returns, trading volume and volatility.

Considering the existing important differences in phrases construction and understanding when one compares English and Portuguese, we analyze annual reports from Brazilian firms, which are written in Portuguese, a Romance language spoken by more than 250 million people in Portugal, Mozambique, Angola, Cape Verde, Guinea-Bissau, São Tomé and Príncipe and West Timor, besides Brazil. Brazil offers a relevant opportunity to investigate the association between the tone of annual reports and stock market variables because it is the world's largest Portuguese-speaking nation and also has the biggest Portuguese-speakers’ capital market in the world. The Brazilian Stock/Commodities Exchange (BM&FBovespa) currently ranks among the world’s largest exchanges, with a domestic market capitalization of USD 1.2 trillion as of August 2014 according to the World Federation of Exchanges.

To isolate potential impacts of GAAP changes in the quality and characteristics of annual reports language, we implement our analysis considering the period pre-IFRS adoption in Brazil (1997-2009). Additionally, we restrict our analysis to the sub-period 1997-2007, because the Brazilian Company Law and specific Brazilian SEC disclosure requirements were altered from 2008 on to permit necessary changes to Brazilian GAAP converge to IFRS. In this sense, we designed a large-sample study to examine the cross-sectional association between the tone of annual reports issued by Brazilian firms from 1997 to 2009 with abnormal returns, abnormal volume and stock price volatility. We started by creating lists of negative, positive, litigious, uncertainty-related and modal words, using an inductive method based on a handcrafted process of word classification. Then, we use sentiment analysis, a computational technique applied to the study of opinions, feelings and emotions in texts (Liu et al., 2005; Liu, 2010), to measure the tone of annual reports. We selected the vector space model (Chisholm and Kolda, 1999; Jurafsky et al., 2002), inspired by the studies of Cecchini et al. (2010), Li (2010a) and Van den Bogaerd and Aerts (2011), to capture the tone of annual reports using our word lists. Our sample comprised companies listed on the BM&FBovespa) that published annual reports in the 1997-2009 time span.

This paper contributes to the literature by offering an assessment of the informational content of the narratives presented in the annual reports of Brazilian firms in the period before the mandatory adoption of the International Financial Reporting Standards (IFRS). We contend that our findings are relevant as a benchmark to compare against future studies that can be developed using annual reports published as of 2010.

We also developed five lists of words from the Portuguese language that are available to other researchers who want to assess the tone of a financial narrative. We built the lists by individually examining all words occurring in at least 5% of the annual reports, in order to evaluate their most likely usage in financial documents. To the best of our knowledge, there are no other studies using a dedicated dictionary for finance in the Portuguese language.

Our results indicate a very weak association between the tone of annual reports and stock market variables in Brazil. When we measured filing period returns using the Fama-French three-factor model (1993), we found a weak association between our tone measures and abnormal returns. When using CAPM, however, we found no association between the tone measures and filling period returns. We also found weak associations between the tone measures and both filing period trading volume and stock prices volatility.

THEORETICAL BACKGROUND

There is a rapidly expanding stream of research that uses textual analysis algorithms in financial contexts to examine the information content of linguistic tone for contemporaneous and future stock returns, return volatilities, and future earnings or cash flows and their uncertainties (Chen et al., 2013), the so-called sentiment analysis.

Sentiment analysis, also known as opinion mining, is a multiple-stage computational method that allows for varied forms of capturing opinions in a text (Jurafsky et al., 2002; Devitt and Ahmad, 2007; Tang et al., 2009; Liu, 2010; Van den Bogaerd and Aerts, 2011).

There are two main types of classification in sentiment analysis. Subjective classification, commonly used when addressing internet forums and blog posts, deals with opinions expressed by a multiplicity of agents, where the problem consists of differentiating subjective from objective opinions in a text (Liu et al., 2005; Seki et al., 2008; Tang et al., 2009; Santos et al., 2009). Sentiment classification, which is in turn subdivided into binary and multi-level classification, allows classification of texts, paragraphs, sentences or words into one of two extremes: positive or negative (Liu et al., 2005; Seki et al., 2008; Tang et al., 2009; Wilson et al., 2009; Ferguson et al., 2009; O'Leary, 2011).

The vector space model (Chisholm and Kolda, 1999; Jurafsky et al., 2002) uses the frequency of words, in accordance to their tone or classification, to specify whether a text bears a pessimistic or optimistic message (Liu et al., 2005; Cecchini et al., 2010; Van den Bogaerd and Aerts, 2011).

Sentiment analysis is not a trivial task, no matter what model is chosen, as many troublesome issues concerning a language’s grammatical aspects are bound to arise. In general, natural language processing techniques involve executing a number of stages to overcome such difficulties. Dale (2010) recommends five stages when processing raw text before the meaning of an oral or textual expression can be contextually determined.

The first stage, tokenization, aims at determining the boundaries of a word or the start and end positions of each word. In cases where more extensive treatment of a text has to be performed, in order to reach a better definition of words and sentences, this first stage can also be more comprehensive, taking the denomination of text preprocessing (Palmer, 2010; Liu et al., 2005).

In the following stage, known as lexical analysis, morphological variants of a word are listed in a dictionary, containing semantic and syntactic information, called a lemma. The process that identifies a word’s root (the reduced form of a word in all its variations) is called stemming (Hippisley, 2010).

The next stage, syntactic analysis, examines a series of words, usually as parts of a sentence, in order to define their structural description according to grammatical rules (Jurafsky et al., 2002; Ljunglöf and Wirén, 2010). Since the ultimate goal is the attribution of meaning to the sentence, a hierarchical syntactic structure, adequate to semantic interpretation, is the desired output for this purpose. Ambiguity and noise are two problems solved at this stage.

The fourth stage, semantic analysis, refers to the examination of the meanings of words, fixed expressions, whole phrases and contextual expressions. Many human expressions are open to multiple interpretations (Jurafsky et al., 2002; Goddard and Schalley, 2010), due to the complexity of natural language. Words can have more than one meaning (lexical ambiguity), and quantifiers, negatives or modal operators can appear in different passages of a text.

It is during the last stage, pragmatic analysis, that the meaning of an oral or textual expression is contextually determined. Pragmatic analysis delves more into spoken or written verbal expressions, while semantic and syntactic analyses are more concerned with the sentence. The actual segregation of the pragmatic, semantic and syntactic stages, though usually regarded as useful for didactic purposes, is not easy in practice (Dale, 2010).

In sentiment analysis, capturing opinions or feelings can be achieved using computerized methods or performed manually (Liu, 2010; Cecchini et al., 2010; Van den Bogaerd and Aerts, 2011). The major benefit in using a computerized approach is the capacity to analyze thousands of documents much faster than through manual analysis. Additionally, a challenging issue in finance is the development of models that provide a better understanding of stock market fluctuations. The connection between sentiment analysis and stock market variables has been recently explored by several authors.

Antweiler and Frank (2004) used about 1.5 million messages posted at the Yahoo! Finance website to identify whether internet messages about financial markets could help forecast market volatility. The opinions identified in the messages were correlated to returns, trading volume and volatility of 45 companies contained in the Dow Jones index. Relationship with returns turned out statistically significant, though economically small. A significant relationship between increased trading volume and differences of opinion expressed in messages was also observed.

Tetlock (2007) investigated the interaction between the media and stock market, using the Wall Street Journal’s “Abreast of the Market” column from 1984 to 1999. He developed a metric for media pessimism and compared it with market returns. The results suggested that high pessimism in the news was related to a decrease in stock price, and that both high and low pessimistic outlooks in published news stories were related to high stock trading volume. The author thus proposed that his metric can be used as a proxy for investor sentiment.

Other studies identified the information content and positive association with returns and volume of linguistic tone in the context of mandatory reports and filings, such as earnings releases (Davis et al., 2012), restatement announcements (Durnev and Mangen, 2011) and MD&A reports (Li, 2010b; Loughran and McDonald, 2011).

More recent studies have analyzed the linguistic tone of conference calls and find positive association of tone metrics and abnormal returns and trading volume (Price et al., 2012; Chen et al., 2013).

Loughran and McDonald (2011) assessed the effectiveness of a dictionary developed by researchers at Harvard University to capture the tone of a financial text. The effectiveness of measuring a text’s undertone rests mainly in the correct classification of a word, regarding its connotation. Usage of a dictionary with connotations not customarily found in a given research domain may lead to erroneous classification of the text. By showing that word lists used in psychology research (Harvard Dictionary) are not suitable to capture the tone of financial texts, Loughran and McDonald (2011) created a list of their own (which they called Fin-Neg) by applying an inductive method. Using their Fin-Neg list, Loughran and McDonald (2011) further analyzed a sample of 50,115 10-K reports, filed by 8,341 companies in the period between 1994 and 2008, aiming to investigate the correlation of the reports’ tone with stock market returns.

Loughran and McDonald (2011) found that nearly 73% of the words with a negative connotation in the Harvard Dictionary did not preserve that same connotation in the Fin-Neg list. Their findings confirmed a better relation with the Fin-Neg list. The authors also proposed an expansion of word classification categories, suggesting that five other word lists could be used to gauge tone: positive, uncertainty, litigious, strong modal words and weak modal words.

The rapidly expanding research on the market impact of language tone motivated us to develop word lists in Portuguese to analyze whether the tone of managers’ reports in Brazil helps explain filling period returns, filling period trading volume and post event price volatility. We analyzed annual reports from Brazilian firms, which operate in an important emerging market and are written in Portuguese, a language spoken by more than 250 million people. Because Portuguese has a different root, in comparison to English, we have developed a specific dictionary (detailed in section Building the dictionary) to implement our analysis.

Similarly to Loughran and McDonald (2011), we used the vector space model to measure the tone of annual reports. In this method, documents and queries are represented as vectors (Manning and Schütze, 1999; Jurafsky et al., 2002). The words in the documents constitute a set of terms, each one identified by the word and the weight associated with its occurrence in the texts. We also consider market-based variables as stock returns, trading volume and volatility to investigate the relation between the tone of annual reports and investor’s reaction to texts within financial statements.

In the next section we describe the procedures used to achieve this aim.

METHODOLOGY

DATA GATHERING

We collected annual reports from the firms listed on the BM&FBovespa website for the period prior to IFRS adoption. All listed firms that published their reports from 1995 to 2009 were selected for analysis. We started with a sample of 4,627 reports. However, only two reports were found for the years 1995 and 1996, so we discarded them, leaving 4,625 reports in the sample.

A total of 940 reports, from companies that had not been traded in the spot market in the period, were also discarded, leaving a sample of 3,685 reports. Another 347 reports were discarded because they did not comply with the minimum 300 words per report requirement, a criterion adopted aiming to eliminate reports with low relevance (Clatworthy and Jones, 2001), leaving a sample of 3,338 reports.

To calculate abnormal returns and abnormal trading volumes, we needed data for returns and trading volumes concerning two periods: before report disclosure (a 100-day window [-101, -1] in the present study) and during report disclosure (a 5-day window [0, 4]). We consider the same window length that Tetlock (2007) and Loughran and McDonald (2011) used to perform their tests. Krivin et al. (2003) state that the selected window period is unlikely to greatly affect the results in any predictable way or even with too great a magnitude.

After our data collection, 1,189 reports from companies that had not had their stocks traded in the mentioned periods were discarded, leaving a sample of 2,209 reports. Table 1 presents detailed information about the sample.

Table 1.
Management reports in the sample (in units).

Lack of financial data in the Economatica database led us to eliminate four additional reports. We also dropped 282 reports due to unavailable data in Economatica on company size and book-to-market ratio, and 21 reports due to lack of data necessary to calculate volatility. Finally, we excluded 1,073 reports due to lack of data needed to estimate expected returns and trading volumes. We thus worked with a final sample of 829 reports, as shown in Table 1, which was the corpus in which we applied sentiment analysis techniques.

BUILDING THE WORD LISTS

ANNUAL REPORTS PREPROCESSING

We developed a conversion process to rid the reports of all non-textual elements, such as tables, images, links to websites and formatting marks. Natural language machine reading of texts depends on every text character being coded in a machine readable format. The preprocessing applied to the sample resulted in a total of 8,772,625 words (81,355 distinct words).

BUILDING THE DICTIONARY

In the next step, we extracted the vocabulary with all the words used in the annual reports (Jurafsky et al., 2002; Cecchini et al., 2010; Van den Bogaerd and Aerts, 2011). We defined a number of criteria before starting to build the vocabulary. First, we wanted to take into account only one occurrence of each word. Second, we also needed to remove all words that match our stop-words list (Cecchini et al., 2010). These include prepositions, conjunctions, pronouns and certain verbal forms that usually do not have explanatory relevance in texts (Savoy and Gaussier, 2010). Third, we did not want to include numbers expressed in numeric format. Since symbols and special characters are not words, they were also discarded (Liu et al., 2005). Finally, we eliminated from the vocabulary all words formed by one or two characters only. The final product of this process was a vocabulary containing 53,577 distinct words, as illustrated in Table 2.

Table 2.
Word dictionary built using annual reports as a source (in units).

Since a sizable portion of the words present in the vocabulary turned out to be irrelevant in annual reports, due to their low frequency or neutral quality, it was necessary to sort out the relevant ones in order to determine which words should be further classified in the categories (Cecchini et al., 2010; Van den Bogaerd and Aerts, 2011). We defined a threshold value that each word in the vocabulary should occur at least once in at least 5% of the annual reports (Loughran and McDonald, 2011). We eliminated 3,283 words that did not meet this criterion.

With the final list containing 22,879 distinct words, we proceeded with their classification as positive, negative, contentious, uncertainty-related and modal. Some words can be classified in two or more categories (Loughran and McDonald, 2011). So, the uncertainty-related words list might contain words also occurring in the list of negative words. Another point mentioned by the authors is that when including a word in the list of negative words, for example, consideration should also be given to the inclusion of its variants.

We considered these issues in examining the words contained in the dictionary before closing the lists. The list of negative words contained 1,080 words, such as “crise”, “endividar”, “impacto”, “risco”, “limitado”, “perder”, “reduzir” and “prejuízo” (in English, “crisis”, “debt”, “impact”, “risk”, “limited”, “lose”, “reduce” and “loss”).

In addition to the negative word list, we also classified words into four other categories: positive, litigious, uncertainty and modal. The list of positive words included 701 words. Positive words are usually expected to have little impact to evaluate a text’s tone (Loughran and McDonald, 2011). Many of the apparently positive words have their classification jeopardized by ambiguity, since they frequently occur in a context of negation (“did not improve”), although it is more difficult to convey positive news using negation of negative words (“did not worsen”)

The list of uncertainty-related words included 170 words, such as “assumir”, “variações”, “especulação”, “eventualidade”, “imaginava”, “instabilidade” and “volatilidade” (in English, “to assume”, “variations”, “speculation”, “eventuality”, “imagined”, “instability” and “volatility”). Words sought in this case are those usually employed in scenarios of uncertainty and risk. As in Loughran and McDonald’s study (2011), some words from the uncertainty-related words list, such as “volatilidade”, “instabilidade” and “risco” (in English, “volatility”, “instability” and “risk”), are also present in the list of negative words.

The litigious words list contained 492 words, such as “anulação”, “contestação”, “investigação”, “legalidade”, “legitimar”, “processual”, “recorrer” and “suborno” (in English, “annulment”, “defense”, “investigation”, “legality”, “to legitimize”, “procedural”, “appeal” and “bribery”). Finally, building of the modal word list took into consideration words that express degrees of certainty or obligation. Examples of modal words are “possível”, “provável”, “improvável”, “necessário”, “talvez”, “deve”, “claramente”, and “compulsório”, (“possible”, “likely”, “unlikely”, “necessary”, “maybe”, “ought”, “clearly” and “compulsory”). The modal list contained 81 words. We prepared the lists out of a corpus that includes an excess of 8 million words occurring in texts directed primarily to the stakeholders of the Brazilian capital market.

SENTIMENT ANALYSIS OF ANNUAL REPORTS

Chisholm and Kolda (1999) emphasized that application of the vector space method requires the use of a term weighting schema capable of determining the weight of the term in a corpus. We adopted the same scheme used by Loughran and McDonald (2011), since the results presented by these authors showed that the model mitigated noise caused by word misclassification. The model for term weighting is (Loughran and McDonald, 2011):

(1)

Where:

Total number of occurrences of word in document ;

Average word count in document ;

Total number of documents in the sample;

Total number of documents with at least one occurrence of the word .

When counting positive words, Loughran and McDonald (2011) take into consideration the existence of a negation within three previous words. However, they claimed that it is not expected to find a denial for a negative word. In the Brazilian context, several studies (Cunha, 2001; Namiuti, 2009; Fonseca, 2009; and Goldnadel and Lima, 2011) have analyzed the phenomenon of sentence-wise negation. However, these studies do not present any analysis that includes instances of a word commonly used in negative contexts being negated. What these authors point out is that negation is always accompanied by a verb in Portuguese. They also emphasize the presence of the order negation + verb or negation + clitic pronoun + verb.

Since the verb can be a word commonly used in positive contexts as well as in negative contexts, we considered the counting of negation in both situations. We also took into account the negative operators^[4] “no” and “not” and the negative quantifiers^[5] “no”, “nobody”, “in a”, “nothing”, “none” and “never”, as they are commonly used to deny propositional content expressed in a sentence (Souza and Pante, 2006; Fonseca, 2009; Goldnadel and Lima, 2011).

DATA ANALYSIS

The data source comprised reports with information on a company’s administration. There is a consensus that reports bearing pessimistic news are expected to be associated with abnormal negative returns, inordinately large trading volumes and high volatility in stock prices (Tetlock, 2007; Antweiler and Frank, 2004; Engle and Ng, 1993).

It was thus necessary to calculate the abnormal return for each company around the date its annual report was released. Abnormal returns are the difference between an asset’s actual return and its expected return in the market (Mackinlay, 1997), as shown in equation 2.

(2)

Abnormal return of asset in period ;

Observed return of asset in period ;

Expected return of asset in period .

It was also necessary to define the window for calculation of abnormal returns. We used the same 5-day window [0, 4] adopted by Loughran and McDonald (2011), with t₀ being the release date of a company’s report and t₁, t₂, t₃ and t₄the days following this release. A period of a 100 days preceding the window [-1, -101] was selected for the calculation of expected returns (Tetlock, 2007; Loughran and McDonald, 2011). We used the SELIC interest rate (the benchmark rate) as the risk-free rate, obtained from the Brazilian Central Bank website (Brasil, 2012). For market returns, we used the closing prices of the BM&FBovespa stock index (Ibovespa), collected in the Economatica database.

We also consider two standard models to calculate expected returns in the finance literature: the CAPM model (equation 3) and the Fama-French (1993) three-factor model (equation 4). The CAPM and Fama-French 3 factor model have been widely used in finance literature to calculate expected returns (Sharpe, 1964; Fama and French, 1992, 1993, 1996; Chordia and Shivakumar, 2006).

We estimated the beta (β) and intercept (α) parameters by calculating linear regressions for each firm in the period of the annual report disclosure, with asset return as the dependent variable and market return as the independent variable.

(3)

Where:

Expected return of asset in period ;

Risk-free rate (SELIC rate);

Intercept for asset ;

Measure of exposure of asset to market risk;

Expected market return in period .

(4)

Where:

Expected return of asset in period ;

Risk-free rate (SELIC rate);

Intercept for asset ;

Measure of exposure of asset to market risk;

Expected market return in period

Size loading factor, a measure of exposure of asset to size risk;

SMB = Small minus Big, the size premium, is the average return on the three small portfolios minus the average return on the three big portfolios, 1/3 (Small Value + Small Neutral + Small Growth) – 1/3(Big Value + Big Neutral + Big Growth)

Value loading factor, a measure of exposure of asset to value risk;

HML = High minus Low, the value premium, is the average return on the two value portfolios minus the average return on the two growth portfolios, 1/2 (Small Value + Big Value) – 1/2 (Small Growth + Big Growth)

Once the expected return for each asset was calculated, we computed the abnormal returns for each day in the window, and finally estimated the cumulative abnormal returns (CAR) from the capitalization of the daily abnormal returns, according to equation 5:

(5)

Where:

Cumulative abnormal return of asset in period ;

Abnormal return of asset in period ;

t = Days after annual report disclosure date

Loughran and McDonald (2011) calculated abnormal trading volumes using the average and standard deviation of trading volumes for the 60 days prior to the event. In the present work, we considered a 100-day span [-1, -101]. Subsequently, we standardized the trading volumes for each day in the time window. As a final step, we added the standardized volumes and calculated the average, according to equation 6.

(6)

Where:

Abnormal trading volume of asset in the period between and ;

Trading volume of asset in period ;

Average trading volume of asset in the period between and ;

Trading volume standard deviation of asset between and .

Volatility was calculated using the 100-day period [+1, +101] subsequent to the event. The volatility between the closing price of a day and the closing price of the previous day was observed for each day in the time window, and then the standard deviation of these observed volatilities was calculated, according to equation 7.

(7)

Where:

Volatility for an n-day period;

Closing price of asset in period .

The independent variables were formulated using the results of the report tone analysis that generated the data of negative (Neg), positive (Pos), litigious (Lit) and uncertainty-related (Unc) words and modal verbs (Mod) from the term weighting model, according to equation 1. The variables firm size and book-to-market were taken from the Economatica database. Control variables related to Fama and French’s (1993) three factors were obtained from a proprietary database belonging to Fucape Business School.

ECONOMETRIC MODELS

We used three OLS econometric models to analyze the relationship between the tone of annual reports and stock market performance. Abnormal return is the dependent variable in the first model. The independent variables in this model are negative (Neg), positive (Pos), litigious (Lit), uncertainty-related (Unc) and modal (Mod) weighted word counts in the corpus. We estimated abnormal returns using the Fama-French three-factor model (Fama and French, 1993).

Importar tabla The second and third models were devised with abnormal trading volume (AV) and volatility (Vol) as dependent variables, respectively. We included firm size (lnAT) and book-to-market (BM) as controls (Tetlock et al., 2008). The independent variables presented in the first model were also included in these other two models, and all models included year control variables.

In the analysis of these models, one should expect to find, in line with previous studies, a relationship of negative news disclosed through reports with negative abnormal returns, abnormal trading volumes and high stock price volatility (Engle and Ng, 1993; Antweiler and Frank, 2004; Tetlock, 2007).

RESULTS

Using the vector space model and the term weighting process, the word “crise” (“crisis”) appeared with the highest weight among those listed as negative words. It occurred 2,009 times in the entire corpus and received a weight of 818.23. The reason was that other factors, like sample size, number of documents in which the word appears, total number of occurrences in each document and the average of other words in each document, all influence the term weighting formula. The negative word “retração” (“retraction”) scored a weight of 570, and its total frequency in relation to the corpus is 685. In other words, “retraction”, with a frequency three times lower than that of “crisis”, received a weight that corresponds to 70% of that for “crisis”.

Moreover, the comparison between the words “crisis” and “fall” (queda) further illustrates the role of term weighting. The word “fall” shows up 3,363 times in the corpus, but it received a lower weight (570.52) than “crisis” (818.23).

We conclude that the mitigation performed by the term weighting procedure fulfilled its intended role, an important step to assure there are no outliers in the sample (Manning and Schütze, 1999; Loughran and McDonald, 2011).

Table 3 shows the results obtained in the regression analysis based on our tone measures for the pre-IFRS period. In Table 4 we present regression results for the full sample.

Table 3
Linear regressions results (1997 to 2007, year dummies omitted).

Robust standard errors shown in parentheses. *p<0.10 **p<0.05 ***p<0.01.

All models are significant, as reported by the F-statistics, but we observed poor measures or adjusted R squared for the models with returns and abnormal volume as dependent variables. Contrary to Loughran and McDonald’s (2011) observations, we find that our negative word lists is not associated with any dependent variable. Moreover, our positive word list is associated with significant higher returns, and the litigious word list is negatively associated with returns, which is also in contrast with Loughran and McDonald’s (2011) results. When abnormal volume is the dependent variable, only the litigious word is significant. Since the coefficient is negative, the more the litigious words that appear in the annual report, the lower is the abnormal trading volume during the event window.

Finally, when volatility is the dependent variable, the positive word list is the only significant variable. Its coefficient is also negative, meaning that the more the positive words that appear in the annual report, the lower is the stock return volatility in the 100-day period subsequent to the annual report issuing.

Our results are rather distinct from the reported by Loughran and McDonald (2011). A plausible explanation is that the authors did not includ all their word lists in one regression due to their high degree of collinearity. However, collinearity did not arise as a problem in our regressions, and we were able to include all word lists in each regression simultaneously.

Table 4
Linear regressions results (full sample, 1997-2009, year dummies omitted).

Robust standard errors shown in parentheses. *p<0.10 **p<0.05 ***p<0.01.

In Table 4 one can observe that our results hold even after the mandatory adoption of the International Financial Reporting Standards (IFRS). However, we observe that both the positive and litigious word lists exhibited a drop in their significance.

Overall, we infer that abnormal returns are weakly associated with our tone measures, which is evidence that the tone of annual reports only marginally influences the behavior of stock market investors in Brazil. Regarding the 1997-2007 period, our results indicate that the tone of annual reports issued by listed companies in Brazil neither offers a context within which to interpret the financial performance of an entity nor reflects achievement of objectives, as expected by the International Accounting Standards Board.

The existing literature on financial texts does not allow one to determine the causal link between tone and returns (Loughran and McDonald, 2011). Nonetheless, our results indicate that in Brazil other concurrent information sources, such as the accounting numbers, prevail over tone in driving returns, trading volumes and volatility.

CONCLUSION

Previous studies have documented associations between textual information disclosed by firms and stock market trading. In this study, we sought to develop a procedure based on natural language processing techniques to investigate the association between some selected market variables (i.e. abnormal returns, volume and volatility) with negative news disclosed by listed Brazilian firms through their annual reports. The procedures were based on the vector space model method and a term weighting schema proposed by Loughran and McDonald (2011).

Such endeavor requires the use of a Portuguese vocabulary, which led us to develop a set of word lists designed to measure the tone of financial texts. The negative, positive, litigious, uncertainty-related and modal word lists that were built from a corpus in excess of eight million words can arguably be applied to the analysis of other finance texts written in Brazilian Portuguese, and perhaps with slight modifications, any other variation of Portuguese.

Overall, we conclude that the tone of annual reports issued by listed companies in the 1997-2007 period does not convey additional information to investors in the Brazilian capital market. However, we find that our positive word list is positively associated with abnormal returns and negatively associated with stock return’s volatility, and these results are at odds with the literature. In addition, we also observed that our litigious word list is negatively associated with both returns and abnormal trading volume, and these are also in contrast with results presented in the literature.

We argue that this finding is relevant as a benchmark to compare against future studies that could be developed using annual reports published from 2010 on. In that year, the use of IFRS compliant financial statements became mandatory for all listed Brazilian companies. The adoption of IFRS introduced a much more encompassing section, titled Formulário de Referência, which contains the “Management Commentary”, the narrative reporting that must accompany financial statements prepared in accordance with IFRS. IASB expects that the Management Commentary “provides a context within which to interpret the financial position, financial performance and cash flows of an entity. It also provides management with an opportunity to explain its objectives and strategies for achieving those objectives” (International Accounting Standards Board, 2010, p. 5). Thus, we suggest that assessing whether the tone of the Formulário de Referência is associated with relevant market variables constitute an interesting research opportunity that could extend this work.

The time lag of annual reports release is a limitation to our study. These reports usually take up to six months to the end of the fiscal year to be public. During this period firms use other sources of communication more spontaneous and timely (e.g. conference calls, market information and management interviews), which can reduce the information content of annual reports. Future research might compare the information of annual report with more timely sources of communication to identify the marginal effects of annual report content at the time it is publicly available. We believe that annual reports may have a confirmatory role (compared to previously available information) in this type of research design.

New studies should also be performed aiming to refine the word lists. A new study could be performed, analyzing the publications of “material facts”, which listed companies must disclose shortly after the occurrence or confirmation of an event that can have a relevant influence on stock price or investor decisions, thus possibly increasing the probability of observing a market reaction to the news.

Future research can also benefit from the negative, positive, litigious, uncertainty-related Portuguese words and their modal verb lists developed here. Such lists, unavailable until the present study, are fundamental to the development of sentiment analysis applications using natural language processing techniques, like the vector space model. Sentiment analysis can be used in other scenarios beyond examining the tone of statutory financial filings, such as in the analysis of financial news reports, firm's press releases, etc. We plan to explore these issues in future research. Also, an interesting extension of our study would involve an approach that assigns weights for each word based on market reactions to documents containing those words.

REFERENCES

ANTWEILER, W.; FRANK, M.Z. 2004. Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3):1259-1294. http://dx.doi.org/10.1111/j.1540-6261.2004.00662.x

BALATA, P.; BRETON, G. 2005. Narratives vs numbers in the annual report: are they giving the same message to the investors? Review of Accounting and Finance, 4(2):5-14. http://dx.doi.org/10.1108/eb043421

BEATTIE, V. 2014. Accounting narratives and the narrative turn in accounting research: Issues, theory, methodology, methods and a research framework. The British Accounting Review, 46(2):111-134. http://dx.doi.org/10.1016/j.bar.2014.05.001

BRASIL. 2012. Banco Central. Available at: http://www.bcb.gov.br/?SELICDIA. Accessed on: 13/09/2012.

CECCHINI, M.; AYTUG, H.; KOEHLER, G.J.; PATHAK, P. 2010. Making words work: Using financial text as a predictor of financial events. Decision Support Systems, 50(1):164-175. http://dx.doi.org/10.1016/j.dss.2010.07.012

CHEN, J.; DEMERS, E.A.; LEV, B. 2013. Oh What a Beautiful Morning! The Time of Day Effect on the Tone and Market Impact of Conference Calls. Darden Business School Working Paper No. 2186862. Available at: http://ssrn.com/abstract=2186862. Accessed on: 26/11/2014. http://dx.doi.org/10.2139/ssrn.2186862

CHISHOLM, E.; KOLDA, T.G. 1999. New term weighting formulas for the vector space method in information retrieval. Computer Science and Mathematics Division, Technical Report Number ORNL-TM-13756, Oak Ridge National Laboratory, Oak Ridge, TN.

CHORDIA, T.; SHIVAKUMAR, L. 2006. Earnings and price momentum, Journal of Financial Economics, 80(3):627-656. http://dx.doi.org/10.1016/j.jfineco.2005.05.005

CLATWORTHY, M.; JONES, M.J. 2001. The effect of thematic structure on the variability of annual report readability. Accounting, Auditing and Accountability Journal, 14(3):311-326. http://dx.doi.org/10.1108/09513570110399890

CUNHA, M.A.F. 2001. O Modelo das Motivações Competidoras no Domínio Funcional da Negação. DELTA: Documentação de Estudos em Linguística Teórica e Aplicada, 17(1):1-30.

DALE, R. 2010. Classical Approaches to Natural Language Processing. In: N. INDURKHYA; F.J. DAMERAU (eds.), Handbook of natural language processing. Boca Raton, Chapman and Hall/CRC, p. 3-7.

DAVIS, A.K.; PIGER, J.M.; SEDOR, L.M. 2012. Beyond the Numbers: Measuring the Information Content of Earnings Press Release Language. Contemporary Accounting Research, 29(3):845-868. http://dx.doi.org/10.1111/j.1911-3846.2011.01130.x

DEVITT, A.; AHMAD, K. 2007. Sentiment polarity identification in financial news: A cohesion-based approach. Annual Meeting-Association for Computational Linguistics, 45(1):984.

DURNEV, A.; MANGEN, C. 2011. The real effects of disclosure tone: Evidence from restatements. Working Paper, 1-55. Available at: http://ssrn.com/abstract=1650003. Accessed on: 27/11/2014.

ENGLE, R.F.; NG, V.K. 2012. Measuring and testing the impact of news on volatility. The Journal of Finance, 48(5):1749-1778. http://dx.doi.org/10.1111/j.1540-6261.1993.tb05127.x

FAMA, E.F.; FRENCH, K.R. 1992. The Cross-Section of Expected Stock Returns. Journal of Finance, 47:427-465. http://dx.doi.org/10.1111/j.1540-6261.1992.tb04398.x

FAMA, E.F.; FRENCH, K.R. 1993. Common risk factors in the returns on stocks and bonds. Journal of financial economics, 33(1):3-56. http://dx.doi.org/10.1016/0304-405X(93)90023-5

FAMA, E.F.; FRENCH, K.R. 1996. Multifactor Explanations of Asset Pricing Anomalies. Journal of Finance, 51(1):55-84. http://dx.doi.org/10.1111/j.1540-6261.1996.tb05202.x

FERGUSON, P.; O'HARE, N.; DAVY, M.; BERMINGHAM, A.; SHERIDAN, P.; GURRIN, C.; SMEATON, A.F. 2009. Exploring the use of paragraph-level annotations for sentiment analysis of financial blogs. In: Workshop on Opinion Mining and Sentiment Analysis, 1, Seville, Spain, 2009. Proceedings... Seville, p. 1-10.

FONSECA, H.D.C. 2009. A noção default e a sintaxe da negação. Revista de Estudos da Língua(Gem), 7(2):109-132.

GODDARD, C.; SCHALLEY, A.C. 2010. Semantic analysis. In: N. INDURKHYA; F.J. DAMERAU (eds.), Handbook of natural language processing. Boca Raton, Chapman and Hall/CRC, p. 92-120.

GOLDNADEL, M.; LIMA, L.S. 2011. Aspectos Pragmáticos da Negação Sentencial. Cadernos do IL, 42:236-259.

HIPPISLEY, A.R. 2010. Lexical Analysis. In: N. INDURKHYA; F.J. DAMERAU (eds.), Handbook of natural language processing. Boca Raton, Chapman and Hall/CRC, p. 31-58.

JEGADEESH, N.; WU, D. 2013. Word power: A new approach for content analysis. Journal of Financial Economics, 110(3):712-729. http://dx.doi.org/10.1016/j.jfineco.2013.08.018

JURAFSKY, D.; MARTIN, J.H.; KEHLER, A. 2002. Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition. Englewood Cliffs, New Jersey, Prentice Hall, vol. 2, 923 p.

KEARNEY, C.; LIU, S. 2014. Textual sentiment in finance: A survey of methods and models. International Review of Financial Analysis, 33:171-185. http://dx.doi.org/10.1016/j.irfa.2014.02.006

KRIVIN, D.; PATTON, R.; ROSE, E.; TABAK, D. 2003. Determination of the Appropriate Event Window Length in Individual Stock Event Studies. Nera Economic Consulting working paper. Available at: http://ssrn.com/abstract=466161. Accessed on: 25/10/2014.

LI, F. 2008. Annual report readability, current earnings, and earnings persistence. Journal of Accounting and economics, 45(2):221-247. http://dx.doi.org/10.1016/j.jacceco.2008.02.003

LI, F. 2010a. Survey of the Literature. Journal of Accounting Literature, 29:143-165.

LI, F. 2010b. The Information Content of Forward-Looking Statements in Corporate Filings. A Naïve Bayesian Machine Learning Approach. Journal of Accounting Research, 48(5):1049-1102. http://dx.doi.org/10.1111/j.1475-679X.2010.00382.x

LIU, B. 2010. Sentiment analysis and subjectivity. In: N. INDURKHYA; F.J. DAMERAU (eds.), Handbook of natural language processing. Boca Raton, Chapman and Hall/CRC, p. 627-666.

LIU, B.; HU, M.; CHENG, J. 2005. Opinion observer: analyzing and comparing opinions on the Web. In: International conference on World Wide Web, 14, Chiba, Japan, 2005. Proceedings... ACM, p. 342-351. http://dx.doi.org/10.1145/1060745.1060797

LJUNGLÖF, P.; WIRÉN, M. 2010. Syntactic parsing. In: N. INDURKHYA; F.J. DAMERAU (eds.), Handbook of natural language processing. Boca Raton, Chapman and Hall/CRC, p. 59-91.

LOUGHRAN, T.; MCDONALD, B. 2011. When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. The Journal of Finance, 66(1):35-65. http://dx.doi.org/10.1111/j.1540-6261.2010.01625.x

MACKINLAY, A.C. 1997. Event studies in economics and finance. Journal of Economic Literature, 35(1):13-39.

MANNING, C.D.; SCHÜTZE, H. 1999. Foundations of statistical natural language processing. London, MIT Press, 657 p.

NAMIUTI, C. 2009. Negação sentencial na diacronia do português: variação com estabilidade. Revista de Estudos da Língua(Gem), 16(2):193-240.

O’LEARY, D.E. 2011. Blog mining-review and extensions: “From each according to his opinion”. Decision Support Systems, 51(4):821-830. http://dx.doi.org/10.1016/j.dss.2011.01.016

PALMER, D. 2010. Text Pre-processing. In: N. INDURKHYA; F.J. DAMERAU (eds.), Handbook of natural language processing. Boca Raton, Chapman and Hall/CRC, p. 9-30.

PRICE, S.M.; DORAN, J.S.; PETERSON, D.R.; BLISS, B.A. 2012. Earnings conference calls and stock returns: The incremental informativeness of textual tone. Journal of Banking and Finance, 36(4):992-1011. http://dx.doi.org/10.1016/j.jbankfin.2011.10.013

SANTOS, R.L.T.; HE, B.; MACDONALD, C.; OUNIS, I. 2009. Integrating proximity to subjective sentences for blog opinion retrieval. In: European Conference on Information Retrieval (ECIR 2009), 31st Toulouse, France. Proceedings... Toulouse, p. 325-336. http://dx.doi.org/10.1007/978-3-642-00958-7_30

SAVOY, J.; GAUSSIER, E. 2010. Information Retrieval. In: N. INDURKHYA; F.J. DAMERAU (eds.), Handbook of natural language processing. Boca Raton, Chapman and Hall/CRC, p. 455-484.

SEKI, Y.; EVANS, D.K.; KU, L.W.; SUN, L.; CHEN, H.H.; KANDO, N.; LIN, C.Y. 2008. Overview of multilingual opinion analysis task at NTCIR-7. In: NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access, 7, Tokyo, Japan, 2008. Proceedings... Tokyo, p. 185-203.

SHARPE, W.F. 1964. Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk. Journal of Finance, 19(3):425-442. http://dx.doi.org/10.1111/j.1540-6261.1964.tb02865.x

SOUZA, A.S.; PANTE, M.R. 2006. O pronome nenhum e a dupla negação portuguesa uma trajetória de gramaticalização? Revista Soletras, 6(12):105-114.

TANG, H.; TAN, S.; CHENG, X. 2009. A survey on sentiment detection of reviews. Expert Systems with Applications, 36(7):10760-10773. http://dx.doi.org/10.1016/j.eswa.2009.02.063

TESSAROLO, I.F.; PAGLIARUSSI, M.S.; LUZ, A.T.M.D. 2010. The justification of organizational performance in annual report narratives. BAR-Brazilian Administration Review, 7(2):198-212. http://dx.doi.org/10.1590/S1807-76922010000200006

TETLOCK, P.C. 2007. Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3):1139-1168. http://dx.doi.org/10.1111/j.1540-6261.2007.01232.x

TETLOCK, P.C.; SAAR‐TSECHANSKY, M.; MACSKASSY, S. 2008. More than words: Quantifying language to measure firms’ fundamentals. The Journal of Finance, 63(3):1437-1467. http://dx.doi.org/10.1111/j.1540-6261.2008.01362.x

U.S. SECURITIES AND EXCHANGE COMMISSION. 2002. Commission Statement about Management's Discussion and Analysis of Financial Condition and Results of Operations. Available at: http://www.sec.gov/rules/other/33-8056.htm. Accessed on: 20/10/2014.

VAN DEN BOGAERD, M.; AERTS, W. 2011. Applying machine learning in accounting research. Expert Systems with Applications, 38(10):13414-13424. http://dx.doi.org/10.1016/j.eswa.2011.04.172

WILSON, T.; WIEBE, J.; HOFFMANN, P. 2009. Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Computational linguistics, 35(3):399-433. http://dx.doi.org/10.1162/coli.08-012-R1-06-90

Notes

[4] In Portuguese: “não”.

[5] In Portuguese: “não”, “ninguém”, “num”, “nada”, “nenhum”, “nunca” and “jamais”.

Author notes

^[1] Universidade de São Paulo. Faculdade de Economia, Administração e Contabilidade de Ribeirão Preto. Av. Bandeirantes 3900, Monte Alegre, Ribeirão Preto, SP, 14040-905, Brasil.

^[2] Universidade Federal do Espírito Santo. Campus Alegre. Alto Universitário, s/n, 29500-000, Alegre, ES, 29500-000, Brasil.

^[3] Fucape Business School. Av. Fernando Ferrari, 1358, 29075-505, Vitória, ES, Brasil.

Filter	Sample size	Discarded observations
Full sample (annual reports from 1995 to 2009)	4,627
Period with a relevant number of observations (1997 to 2009)	4,625	2
Reports of companies traded in the spot market	3,685	940
Report word count >= 300	3,338	347
Data available in Economatica database	2,205	1,133
Book-to-market and firm size data available in Economatica database	1,923	282
Data to calculate volatility available	1,902	21
Data on returns and trading volumes for days [-101, -1] available	829	1,073
Data on returns and trading volumes for days [0, 4] available	829	0
Number of company reports in the period	829
Number of companies	204
Average number of reports per company	4

Filter	Dictionary size in words	Words eliminated
Full list of words extracted from annual reports	8,772,625
List of distinct words extracted from the full list	81,355	8,691,270
List after removal of stopwords	80,986	369
List after removal of numerically formatted numbers	71,709	9,277
List after removal of symbols and special characters	71,083	626
List after removal of words with only 1 or 2 characters	53,577	17,506
Final dictionary obtained	53,577
List of words with at least 5% frequency in annual reports	3,283
List of words with at least 5% frequency in annual reports, plus variations present in the dictionary	22,879

Independent Variable	Abnormal Returns (Fama-French)		Abnormal Volume		Volatility
Negative	- 0.0052 (0.222)		0.0270 (0.0331)		0.0008 (0.0016)
Positive	0.5335 (0.2444)	**	0.0318 (0.0394)		- 0.0072 (0.0021)	***
Litigious	- 0.6297 (0.2663)	**	- 0.0869 (0.0294)	***	0.0010 (0.0020)
Uncertainty-Related	0.0932 (0.3484)		-0.0008 (0.0454)		0.0025 (0.0020)
Modal	- 0.0715 (0.5025)		- 0.0278 (0.0682)		0.0062 (0.0045)
Book to Market	-		0.0026 (0.0169)		0.0012 (0.0029)
LnTotalAssets	-		0.0410 (0.0275)		- 0.0032 (0.0015)	**
Constant	2.332		0.2868		0.2172
	(2.975)		(0.7021)		(0.0252)	***
Sample size	526		526		526
F-Statistic	1.94		1.96		6.07
Model Significance	0.0179		0.0119		0.000
R2	0.0462		0.0549		0.1604

Independent Variable	Abnormal Returns (Fama-French)		Abnormal Volume		Volatility
Negative	- 0.0084 (0.175)		0.0289 (0.0254)		- 0.0020 (0.0019)
Positive	0.3875 (0.2085)	*	-0.0099 (0.0339)		- 0.0088 (0.0020)	***
Litigious	- 0.3398 (0.2057)	*	- 0.0640 (0.0224)	***	0.0004 (0.0018)
Uncertainty-Related	- 0.087 (0.2609)		0.0164 (0.0350)		0.0041 (0.0024)
Modal	0.2136 (0.4102)		0.0315 (0.0565)		0.0143 (0.0146)
Book to Market	-		0.0052 (0.0140)		0.0025 (0.0027)
LnTotalAssets	-		0.0237 (0.0200)		- 0.0038 (0.0016)	**
Constant	1.705		0.5308		0.2326
	2.843		(0.6435)		(0.0262)	***
Sample size	829		829		829
F-Statistic	1.78		1.85		9.43
Model Significance	0.0262		0.0147		0.000
R2	0.0330		0.0371		0.2409