Brain's Combined Sentiment dataset aggregates sentiment signals from three sources: news articles, earnings call transcripts, and 10-K filing language into a single combined sentiment score for ~5,000 US equities. Each daily file contains sentiment metrics per ticker, with lookback windows configurable per source. This is a panel dataset ideal for cross-sectional factor construction and event-driven signal research.
| Column | Type | Non-Null | Null % | Unique | Description |
|---|---|---|---|---|---|
| DATE | string | 7,116,723 | 0% | 2,136 | Trading date (YYYY-MM-DD) |
| COMPOSITE_FIGI | string | 7,116,723 | 0% | 4,961 | Bloomberg FIGI identifier |
| NAME | string | 7,116,723 | 0% | 5,022 | Company name |
| TICKER | string | 7,116,723 | 0% | 5,076 | Stock ticker symbol |
| NEWS_N_PAST_DAYS_AGGR | int32 | 7,116,723 | 0% | 1 | News lookback window (constant=30) |
| NEWS_VOLUME | float64 | 6,073,565 | 14.7% | 3,479 | Article count in lookback period |
| NEWS_SENTIMENT | float64 | 6,073,565 | 14.7% | 864,063 | News-based sentiment score |
| EC_LAST_CALL_N_PAST_DAYS | float64 | 7,116,723 | 0% | 365 | Days since last earnings call |
| EC_LAST_CALL_SENTIMENT | float64 | 7,116,723 | 0% | 811,478 | Earnings call sentiment score |
| CF_LAST_10K_N_PAST_DAYS | float64 | 7,116,723 | 0% | 730 | Days since last 10-K filing |
| CF_LAST_10K_SENTIMENT | float64 | 7,116,723 | 0% | 561,920 | 10-K filing sentiment score |
| COMBINED_SENTIMENT | float64 | 7,116,723 | 0% | 700,120 | Weighted combined sentiment |
Interpretation: NEWS_VOLUME and NEWS_SENTIMENT are null for the same 1.04M rows — these represent tickers with zero news coverage in the 30-day lookback window. This is structurally expected (many small-caps have little news). The COMBINED_SENTIMENT still has values for these rows as it falls back on EC and 10K signals.
Assessment: The dataset provides excellent large-cap coverage (97.4% SPX) and strong broad-market coverage (89.4% R3K). The 84.7% Russell 2000 coverage is impressive for a sentiment dataset, as many small-caps have limited news and filing coverage. This dataset is highly representative for factor research across all major US equity benchmarks.
| Metric | News Sentiment | Earnings Call | 10-K Filing | Combined |
|---|---|---|---|---|
| Range | [-1.082, 0.817] | [-1.713, 0.340] | [-0.340, 0.575] | [-1.022, 0.431] |
| Mean | 0.0005 | 0.0137 | -0.0123 | -0.0005 |
| Median | 0.0132 | 0.0915 | -0.0212 | 0.0234 |
| Std Dev | 0.185 | 0.286 | 0.106 | 0.133 |
| Skewness | -0.769 | -2.525 | 0.507 | -1.688 |
| Kurtosis | 2.736 | 9.918 | 0.447 | 5.894 |
| % Positive | 53.8% | 65.4% | 42.0% | 58.6% |
| % Negative | 46.2% | 34.6% | 58.0% | 41.4% |
| Outlier Rate (IQR) | 4.5% | 5.6% | 1.1% | 3.8% |
Interpretation: The three sentiment sources (news, earnings calls, 10-K filings) have low cross-correlation with each other — confirming they provide independent information. This makes the combined score a genuinely diversified signal. News volume has minimal correlation with any sentiment measure, ruling out volume-bias concerns.
| Column | Outliers | Rate | Lower Bound | Upper Bound | Min Value | Max Value | Status |
|---|---|---|---|---|---|---|---|
| NEWS_VOLUME | 537,141 | 8.84% | -47.50 | 100.50 | 1.00 | 3,968.00 | ELEVATED |
| NEWS_SENTIMENT | 272,345 | 4.48% | -0.389 | 0.410 | -1.082 | 0.817 | MODERATE |
| EC_LAST_CALL_SENTIMENT | 397,446 | 5.58% | -0.498 | 0.618 | -1.713 | 0.340 | MODERATE |
| CF_LAST_10K_SENTIMENT | 79,952 | 1.12% | -0.296 | 0.263 | -0.340 | 0.575 | LOW |
| COMBINED_SENTIMENT | 270,449 | 3.80% | -0.274 | 0.301 | -1.022 | 0.431 | MODERATE |
News volume outliers are driven by mega-cap stocks (AAPL, TSLA, NVDA) which naturally attract orders of magnitude more coverage. Earnings call sentiment outliers come from extreme negative calls (profit warnings, restatements). These are real signals, not data errors — consider robust scaling (winsorization or rank-transform) rather than removal.