Tech or Treat: The Sentimental Stock Saga
Proposal
High-level goal
Sentiment analysis of diverse data sources to predict Apple Inc. stock price movement.
Abstract
Our aim is to conduct sentiment analysis on Apple Inc. stock by analysing diverse sources, including media coverage, industry reports, social media reviews, and investor opinions from news headlines. By leveraging sentiment analysis techniques, our project intends to uncover patterns in sentiment that may correlate with stock price movements. The insights gained could provide valuable perspectives on how public perception influences Apple’s stock performance, offering potential benefits for investors and analysts.
Dataset
# importing module to download data
import yfinance
# Using Ticker module in yfinace to get data of desired stock
= yfinance.Ticker("AAPL")
apple # usign history function to get stock data of yesterday with 1 hour interval from maket opening to closing
= apple.history(period='1d',interval='1h') data
The data is retrived from \(Yahoo^\textregistered!\ Finance\) in real-time. This data contain’s the information about a particular stock and how it performed during certain period. This information includes openning,closing and other related features of the stock. The following are the attributes(freatures) of the data retrived:
Column Name | Data Type | Description |
---|---|---|
Datetime | String | This column represents the date and time for each data point, in the format “YYYY-MM-DD HH:MM:SS-TZ.” It provides the timestamp at which the stock price information was recorded. The timezone (TZ) indicates the time zone for the recorded data. |
Open | Integer | This column contains the opening price of the stock at the specific timestamp. The opening price is the price at which the stock started trading at the beginning of the given time interval (e.g., each hour). |
High | Integer | The “High” column contains the highest price reached by the stock during the time interval. It represents the peak value of the stock’s price within that hour. |
Low | Integer | In this column, you find the lowest price reached by the stock during the same time interval. It represents the minimum value of the stock’s price within that hour. |
Close | Integer | The “Close” column provides the closing price of the stock at the end of the specified time interval. It is the last price recorded before the end of that hour. |
Volume | Integer | This column represents the trading volume of the stock during the given time interval. Trading volume is the total number of shares or contracts traded during that hour. |
Dividends | Integer | The “Dividends” column typically contains information about any dividend payments made during the specified time interval. In the dataset you provided, it appears that no dividends were recorded (values are zero). |
Stock Splits | Integer | The “Stock Splits” column would typically record any information related to stock splits that occurred during the specified time interval. In the dataset, it also appears that no stock splits were recorded (values are zero). |
data
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
Datetime | |||||||
2023-11-07 09:30:00-05:00 | 179.179993 | 180.820007 | 179.009995 | 180.710007 | 16268674 | 0.0 | 0.0 |
2023-11-07 10:30:00-05:00 | 180.695007 | 181.595001 | 180.529999 | 181.455002 | 8190077 | 0.0 | 0.0 |
2023-11-07 11:30:00-05:00 | 181.455002 | 181.639999 | 180.520004 | 181.175003 | 7493420 | 0.0 | 0.0 |
2023-11-07 12:30:00-05:00 | 181.175003 | 182.160004 | 180.934998 | 182.014999 | 6208182 | 0.0 | 0.0 |
2023-11-07 13:30:00-05:00 | 182.018494 | 182.440002 | 181.929993 | 182.340897 | 5219872 | 0.0 | 0.0 |
2023-11-07 14:30:00-05:00 | 182.350006 | 182.429993 | 181.899994 | 182.198303 | 5513486 | 0.0 | 0.0 |
2023-11-07 15:30:00-05:00 | 182.195007 | 182.339996 | 181.550003 | 181.820007 | 8142789 | 0.0 | 0.0 |
Headlines Dataset
import requests
import pandas as pd
from dateutil import parser
from datetime import datetime, timedelta
='9dcab5d0d86940459623ec7dea5c8d36'
api_key="AAPL"
stock_symbol = {
query_params 'q': f'{stock_symbol}',
'apiKey': api_key,
'language': 'en', # English language
'country': 'us', # USA sources
}
= datetime.now()
end_date = end_date - timedelta(days=7)
start_date
= start_date.strftime("%Y-%m-%d")
from_date = end_date.strftime("%Y-%m-%d")
to_date
= f"https://newsapi.org/v2/everything?q={stock_symbol}&apiKey={api_key}&from={from_date}&to={to_date}&language=en"
news_url
= requests.get(news_url)
response
if response.status_code == 200:
= response.json()
news_data = news_data['articles']
articles = [(article['title'], article['publishedAt']) for article in articles]
headlines else:
print("Failed to retrieve news data.")
=[]
apple_related_headlinesfor headline, _ in headlines:
try:
= parser.parse(_)
date if 'Apple' in headline:
apple_related_headlines.append((headline, date))except ValueError:
pass
= lambda x: x[1])
apple_related_headlines.sort(key =pd.DataFrame(apple_related_headlines, columns=['Headline', 'Date'])
news_dataframe news_dataframe.head()
Headline | Date | |
---|---|---|
0 | SwitchBot for iOSがApple Watchのコンプリケーションに対応。 | 2023-11-06 05:24:22+00:00 |
1 | Apple's New Feature Detects Water In USB-C Por... | 2023-11-06 08:20:06+00:00 |
2 | Charlie Munger's Apple Confidence, Raskin Take... | 2023-11-06 10:26:19+00:00 |
3 | Apple iPhone-maker Foxconn sees solid holiday ... | 2023-11-06 12:16:18+00:00 |
4 | Apple MacBook Pro 14-inch review: Huge amounts... | 2023-11-06 17:14:11+00:00 |
Column Name | Data Type | Description |
---|---|---|
HeadLine | String | A text field containing headlines related to Apple, ranging from news updates to reviews and opinions. |
Date | String | A datetime field representing the date and time when the headline was published or reported. The timestamps are in Coordinated Universal Time (UTC). |
Why did we choose this data?
This dataset is suitable for analyzing the stock price movement due to its time series format with open, high, low, and close prices, as well as volume, dividends, and stock splits. The high-frequency data, recorded at hourly intervals, is valuable for short-term trading. These features provide essential data for analysis and modeling of the stock price movement.
Variables Involved:
News Headlines: Collecting a dataset of news headlines relevant to the stock market. These headlines will serve as a variable to assess the impact on stock prices.
Stock Prices: Gathering historical stock price data for a selected set of stocks within a particular sector. These will be the stocks to study for correlations.
Sector Information: Identifying and categorizing stocks into their respective sectors for sector specific analysis.
Variables to be Created:
Correlation Coefficients: Calculating the correlation coefficients to quantify the relationships between Apple’s stock price movements and news headlines, and between stocks within the same sector.
Market Sentiment Index: Creating a sentiment index based on the tone of the news headlines (positive, negative, or neutral sentiment) to gauge how news sentiment affects Apple’s stock prices.
Question
How can sentiment analysis of news headlines contribute to understanding the stock price movements of Apple Inc., and what insights can be gained regarding the interconnectedness of external factors and their impact on Apple’s stock performance in the broader market context?
Motivation:
The question sparks curiosity about how news headlines might affect Apple’s stock prices. By delving into sentiment analysis, we aim to uncover patterns and insights in the relationship between public sentiment, specific news events, and Apple’s stock performance. This exploration could provide valuable knowledge for investors, analysts, and anyone keen on understanding the dynamics of stock price movements and market influences on Apple.
Analysis plan
Approach for Question
To address the question in a general sense, we would employ a combination of data analysis
and predictive modeling
. By collecting historical data on stock prices and news headlines, we can examine the historical relationship between news sentiment and market movements. This analysis might involve quantifying sentiment
in news headlines, identifying patterns
, and exploring correlations
. Furthermore, predictive models can be developed to forecast potential future stock price changes based on these insights. The goal is to gain a better understanding of how news impacts stock prices and, if possible, use this knowledge to make informed predictions about future market behavior.
Plan Summary:
Data Collection:
Gather a diverse dataset of Apple-related news headlines, industry reports, and social media reviews during the specified timeframe.
Collect historical stock price data for Apple Inc. corresponding to the same period.
Data Preprocessing:
Clean and preprocess the text data, including handling any language-specific characters, removing duplicates, and ensuring consistency.
Preprocess the stock price data, ensuring alignment with the timeframes of the news dataset.
Sentiment Analysis:
Utilize natural language processing techniques to perform sentiment analysis on the collected news headlines.
Categorize sentiments as positive, negative, or neutral to quantify the overall sentiment trends.
Correlation Analysis:
Identify key news events and occurrences during the specified timeframe. Analyze how the sentiment trends derived from news headlines correlate with Apple’s stock price movements.
Explore potential correlations between news events related to other stocks and subsequent impacts on Apple’s stock.
Visualization and Interpretation:
Create visualizations to illustrate the sentiment trends and stock price movements over time.
Interpret the findings, identifying patterns, anomalies, and potential cause-effect relationships.
Statistical Analysis:
Conduct statistical tests to validate the significance of observed correlations and trends.
Evaluate the strength and direction of correlations between sentiment and stock price movements.
Discussion and Conclusions:
Summarize the key findings and their implications for understanding the influence of news sentiment on Apple’s stock prices.
Discuss any broader market influences on Apple’s stock performance. Consider limitations, potential biases, and areas for future research.
Weekly Plan of Attack
Week | Weekly Tasks | Persons in Charge | Backup |
---|---|---|---|
until November 8th | Explore and finalize the dataset and the problem statements | Everyone | Everyone |
- | Complete the proposal and assign some high-level tasks | Everyone | Everyone |
November 9th to 15th | Getting to know about yfinance library and news headlines |
Everyone | Everyone |
- | Data cleaning and Data pre-processing | Eshaan | Aravind |
- | Question specific exploration and data categorization | Likith | Sanjay |
November 16th to 22nd | Performing Sentiment analysis for Q1 |
Sanjay | Likith |
- | Performing Sentiment analysis and find correlations for Q1 |
Vamsi | Aravind |
- | Exploring on how to integrate our analysis with real-time prices. | Aravind | Likith |
November 23rd to 29th | Generating remaining parts of the plots for Q1 | Eshaan | Sanjay |
- | Improving the generated sentiment analysis model | Sanjay | Vamsi |
- | Start integrating quarto and our model | Likith | Eshaan |
November 30th to December 6th | Refining the code for code review with comments | Everyone | Everyone |
- | Making a few changes on the model to test it based on historical data | Vamsi | Aravind |
- | Continue with the integration of quarto and our models | Aravind | Eshaan |
December 7th to 13th | Complete the quarto website with presentable data | Everyone | Everyone |
- | Review the model and debugging | Everyone | Everyone |
- | Write-up and presentation for the project | Everyone | Everyone |
Repo Organization
The following are the folders involved in the Project repository.
‘data/’: Used for storing any necessary data files for the project, such as input files.
‘images/’: Used for storing image files used in the project.
‘_extra/’: Used to brainstorm our analysis which won’t impact our project workflow.
‘_freeze/’: This folder is used to store the generated files during the build process. These files represent the frozen state of the website at a specific point in time.
‘.github/’: Folder for storing github templates and workflow.
These are the planned approaches, and we intend to explore and solve the problem statement which we came up with. Parts of our approach might change in the final implementation.