What This Exercise Is About
You are technology students, not finance students. You do not need to become a trader. You need to become a disciplined thinker who can gather evidence, reason under uncertainty, and explain a conclusion clearly. The market is the practice arena. The real skill is the process.
Monday: Set the Prediction
Your team studies what happened last week, queries four AI models with a shared prompt, and commits a prediction (% up or down) for each tracked asset — with a stated confidence level and evidence.
Friday: Record Actuals
After markets close on Friday, your team records the actual % change for every asset. Do not modify your Monday prediction. The GitHub commit timestamp is the evidence.
Following Monday: Score & Learn
You score your calibration (not just accuracy), present your reasoning to the class, and identify what the AI models got right, wrong, and why. Then you update the workflow before the next prediction.
⚠️ Important framing: This is not a competition to predict correctly. It is an exercise in calibration — stating your confidence accurately. A cautious, well-reasoned "uncertain" is worth more than an overconfident wrong prediction. You are marked on reasoning quality, not market accuracy.
The Nine Assets You Track
ReferenceThese are the instruments your team makes weekly predictions about. Each one measures a different part of the global economy. Together, they tell a story about where money is moving and why.
S&P 500
A basket of the 500 largest US companies. Think Apple, Microsoft, Amazon, Google, Tesla. It is the most widely watched barometer of the US stock market. When people say "the market went up today," they almost always mean the S&P 500.
Why it matters: It is the benchmark everything else is measured against. If the S&P 500 is rising, the broad economy is usually healthy. If it falls, investors are worried.
Nasdaq 100
The 100 largest non-financial companies listed on the Nasdaq exchange. This is heavily weighted toward technology: semiconductors, cloud software, AI, biotech. It moves faster and more dramatically than the S&P 500.
Why it matters: Compare Nasdaq vs S&P 500 performance each week. If Nasdaq leads, the market is favouring growth and tech. If Nasdaq lags, investors may be rotating into safer, older industries.
Russell 2000
2,000 small US companies. Unlike the S&P 500 giants that earn globally, most Russell 2000 companies sell almost entirely within the US and borrow heavily. This makes them very sensitive to US interest rates and domestic economic conditions.
Why it matters: When Russell 2000 underperforms the S&P 500, it often signals that investors fear higher rates or a slowing US economy. It is one of the best early warning signals.
Gold
A physical metal that has been used as money and a store of value for thousands of years. Gold does not pay dividends or interest. People buy it when they are afraid: afraid of inflation, afraid of economic collapse, afraid of currency debasement.
Why it matters: Gold rising while stocks fall = fear trade. Gold rising while stocks also rise = inflation concern. Gold falling while stocks rise = confidence, risk-on mood. It is the market's fear gauge alongside VIX.
Crude Oil (WTI)
West Texas Intermediate crude oil is the North American benchmark price for a barrel of oil. Oil touches almost every product and service in the economy: transport, manufacturing, heating, plastics. A major oil price move creates ripples everywhere.
Why it matters: Rising oil = higher inflation expectations, pressure on consumer spending, boost for energy stocks. Falling oil = potential deflation, relief for consumers, pressure on energy sector. Geopolitical events (wars, sanctions) can move oil dramatically in a single day.
10-Year Treasury Yield
The interest rate the US government pays to borrow money for 10 years. This is the most important single number in global finance. Mortgages, corporate loans, and valuations of all stocks are mathematically linked to this rate.
Why it matters: Rising yield → borrowing costs go up → companies worth less → stock valuations fall → pressure on Russell 2000 and growth stocks especially. Falling yield → the opposite. Think of yield as gravity: the higher it is, the harder it is for asset prices to stay elevated.
US Treasury Bonds (TLT)
When yields rise, bond prices fall — and vice versa. Bonds and yields always move in opposite directions. The TLT ETF tracks long-dated US Treasury bonds (20+ years). When investors are scared, they often flee into bonds — "flight to safety."
Why it matters: Watch whether stocks and bonds are moving together or apart. Stocks up + bonds up = unusual, often a "Goldilocks" moment. Stocks down + bonds up = classic fear trade. Stocks down + bonds also down = something unusual (inflation shock, fiscal worry).
VIX — The Fear Index
The CBOE Volatility Index measures how much the options market expects the S&P 500 to move over the next 30 days. It is calculated from the prices of options contracts — essentially, how much investors are paying for insurance against a market drop.
Why it matters: VIX below 15 = calm. VIX 15–25 = moderate concern. VIX above 30 = fear. VIX above 40 = panic (happened during COVID crash, 2008 crisis). A rising VIX usually means falling stocks. Watch for VIX spikes as early warnings.
Bitcoin
The largest cryptocurrency by market cap. Bitcoin trades 24/7 — unlike stocks — and is highly sensitive to risk appetite, liquidity conditions, and regulatory news. In recent years it has increasingly correlated with Nasdaq during risk-off events.
Why it matters: Bitcoin often moves first. When risk appetite improves, Bitcoin can surge before stocks do. When fear hits, Bitcoin can crash faster than any stock index. It is a useful leading indicator of risk appetite, though very noisy.
How These Assets Talk to Each Other
The key cross-asset relationships to watch every week:
🔗 Yields ↑ → Stocks (especially Nasdaq & Russell) ↓ — higher borrowing costs hurt growth companies most
🔗 Gold ↑ + Stocks ↓ — fear trade, money fleeing risk
🔗 Oil ↑ → Energy sector ↑, Consumer Discretionary ↓ — oil is both a threat and a sector opportunity
🔗 VIX ↑ → S&P ↓ — almost always true in the short term
🔗 Bitcoin ↑ with Nasdaq ↑ — risk-on mood, growth appetite
🔗 Russell 2000 lagging S&P 500 — warning sign: broad rally may be narrowing
The 11 S&P 500 Sectors
ReferenceThe S&P 500 is divided into 11 sectors. Each week, watch which sectors are leading (buying) and which are lagging (selling). The pattern tells you what kind of week it was — risk-on or risk-off, growth or defensive.
Rotation pattern to memorise: When money moves from Technology → Utilities/Staples, that is defensive rotation (investors getting scared). When money moves from Staples/Utilities → Technology/Financials, that is risk-on rotation (investors getting confident). Sector leadership is one of the most reliable signals of market mood.
Your Three Bookmarks
ToolsYou do not need to visit ten websites. Bookmark these three. Together they give you every number on the weekly scorecard in under 5 minutes every Monday morning.
Finviz Futures Performance
One page. All macro assets (S&P, Nasdaq, Russell, Gold, Oil, 10-Year, Bonds, Bitcoin, VIX). Switch from "1D" to "1W" with one click to see the full weekly % change. No login. No paywall.
★ Use this first every MondayYahoo Finance Sectors
All 11 S&P sectors with weekly % change as a colour-coded heatmap. Change the time period to "5D" to see the prior week's performance. Green = up, Red = down.
Free · No login neededTradingEconomics Calendar
The week-ahead macro calendar. Shows CPI, jobs data, Fed speeches, earnings dates, and every major economic event that could move markets. Feed this into your Macro Agent and your prediction reasoning.
Free · No login neededMonday morning workflow: Open Tab 1 (Finviz, set to 1W) → screenshot → open Tab 2 (Yahoo Sectors, set to 5D) → screenshot → open Tab 3 (TradingEconomics, look at the coming week) → note key events. Total time: 5 minutes. Commit both screenshots to your GitHub evidence folder labelled with the date.
The Weekly Rhythm
Pull last week's data & submit prediction
- Open Finviz (1W view) and Yahoo Sectors (5D view) — record all numbers
- Run your three agents: Almanac, Macro/News, Technical
- Feed all three agent outputs into the shared LLM prompt
- Query all four AI models (Claude, ChatGPT, Gemini, DeepSeek)
- Complete the Multi-LLM Comparison Table
- Write team consensus prediction — state direction and % estimate and confidence level
- Commit prediction to GitHub before markets open. The commit timestamp is your evidence.
Track & note deviations
- Check if any major event is moving assets dramatically away from your prediction
- Do not change your prediction — just note the deviation in your learning log
- Record any major news that was not in your Monday evidence base
Record the actuals
- Open Finviz (1W view) and Yahoo Sectors (5D view) — the week's final numbers are now set
- Record actual % change for all 9 assets and 11 sectors
- Do not modify your Monday prediction
- Commit actuals to GitHub evidence folder
Score, present, and learn
- Calculate your calibration score for each asset (see Scoring section)
- Identify: which LLM was closest? Which was most overconfident?
- Prepare a 3-minute presentation: prediction vs actual, best reasoning, biggest miss, and what changes to the workflow next week
- Update your learning log in GitHub
Multi-LLM Synthesis
Core ProcessEvery Monday, all four AI models receive the same structured prompt built from your three agent outputs. You compare their predictions side-by-side before writing your team's consensus.
ALMANAC EVIDENCE:
[Paste your Almanac Agent output here — seasonal bias, confidence, caveats]
MACRO / NEWS EVIDENCE:
[Paste your Macro/News Agent output here — FedWatch, rates, dollar, oil, calendar events]
TECHNICAL EVIDENCE:
[Paste your Technical Agent output here — EMA signals, trendlines, key levels]
REQUIRED OUTPUT — respond in exactly this structure:
1. Weekly Regime: [Bullish / Bearish / Neutral / Uncertain]
2. Confidence Score: [Low / Medium / High] + brief justification
3. Key Supporting Evidence: (3 points max)
4. Key Contradictions: (2 points max)
5. Invalidation Conditions: what would change this view
6. Predicted % move for S&P 500 this week: [+X.X% to +X.X%] or [-X.X% to -X.X%]
7. Plain-English brief: 2–3 sentences a non-expert can understand
8. Disclaimer: remind the reader this is not financial advice
Rule: Do not change the prompt between models. Do not add extra context for one model that others do not get. Fair comparison requires identical inputs. Store all four raw responses in GitHub named: synthesis_claude_YYYY-WXX.txt, synthesis_chatgpt_YYYY-WXX.txt, etc.
Multi-LLM Comparison Table — Fill This Every Sprint
After querying all four models, fill this table before writing your consensus:
| Dimension | Claude | ChatGPT | Gemini | DeepSeek |
|---|---|---|---|---|
| Weekly Regime | fill in | fill in | fill in | fill in |
| Confidence Score | Low/Med/High | fill in | fill in | fill in |
| S&P 500 % estimate | e.g. +0.5% to +1.2% | fill in | fill in | fill in |
| Top supporting reason | key phrase | fill in | fill in | fill in |
| Top contradiction cited | key phrase | fill in | fill in | fill in |
| Invalidation condition | what would change it | fill in | fill in | fill in |
| Tone / caveat language | cautious/assertive | fill in | fill in | fill in |
Consensus protocol: Where ≥3 models agree → high-confidence core of your brief. Where models diverge → document as a contradiction and include in your watchlist. Your final prediction must state which view was chosen and why you weighted certain models more or less for that specific week.
Calibration Scoring
How You're GradedThis is the most important concept in the exercise. You are not scored on whether you predicted correctly. You are scored on how well your stated confidence matched your outcome. This is called calibration. It is the core skill of all evidence-based reasoning — in markets, medicine, engineering, and AI.
| Stated Confidence | Direction | Outcome | Score | Reason |
|---|---|---|---|---|
| High | Up or Down | ✅ Correct | +3 pts | Well-evidenced, committed, and right |
| Medium | Up or Down | ✅ Correct | +2 pts | Good reasoning, measured confidence |
| Low / Uncertain | Up or Down | ✅ Correct | +1 pt | Honest about limits, got lucky — acceptable |
| High | Up or Down | ❌ Wrong | −2 pts | Overconfidence penalty — worst outcome |
| Medium | Up or Down | ❌ Wrong | 0 pts | Tried, wrong, not overconfident — neutral |
| Low / Uncertain | Any | ❌ Wrong | +1 pt | Honest uncertainty — always rewarded |
LLM Horse Race — Tracked Across the Trimester
Each week you record which AI model was closest to the actual S&P 500 % move. Over 10 weeks, a leaderboard emerges. By Week 10, you will have real data to answer: which AI model is most calibrated for weekly market regime prediction? This is valuable AI literacy — based on evidence, not opinion.
Upset of the week recognition: When the actual move defied all four AI models, that week's richest learning is: why were the models all wrong? The team that best explains the miss scores highest for that week, regardless of their prediction accuracy.
This Week's Setup
Week 2 · 26 May 2026⚡ This section is updated by Prof. Dr. Tan every Monday morning. Check here before your team begins the weekly workflow. Any changes to the prediction format, scoring rules, or special instructions for the week will appear here.
Week 1 Actuals (19–23 May 2026) — Use This as Your Evidence Base
Pull from Finviz (1W) and Yahoo Sectors (5D) on Monday morning. The numbers below are placeholders — replace with the actual Friday closes you read from the sites.
| Asset | Ticker | Friday Close | Weekly % Change | Your reading (fill in) |
|---|---|---|---|---|
| S&P 500 | ^GSPC / ES | — | — | ________ |
| Nasdaq 100 | ^NDX / NQ | — | — | ________ |
| Russell 2000 | ^RUT / RTY | — | — | ________ |
| Gold | GC=F / GC | — | — | ________ |
| Crude Oil (WTI) | CL=F / CL | — | — | ________ |
| 10-Year Yield | ^TNX / ZN | — | — | ________ |
| US Bonds (TLT) | TLT / ZB | — | — | ________ |
| VIX | ^VIX / VX | — | — | ________ |
| Bitcoin | BTC-USD | — | — | ________ |
Week 2 Instructions — Prediction Window: 26–30 May 2026
- Read the Almanac section of your URD. What does seasonality suggest for late May historically?
- Check CME FedWatch (cmegroup.com/fedwatch). Did rate expectations shift last week? What direction?
- Note the week-ahead calendar on TradingEconomics. Are there any CPI, jobs, or Fed speaker events this week that could move markets?
- Run the shared prompt through all four AI models. Record every response in your GitHub evidence folder.
- Complete the comparison table and submit your consensus prediction by Monday before class.
- Special instruction this week: Include your prediction for both the S&P 500 sector that you expect to lead AND the sector you expect to lag. Justify both calls.
Prof. Dr. Tan's note: This is your first live prediction. Start simple. Use the three-agent evidence. Do not overthink. A clear, evidence-linked "neutral, medium confidence, because X" is excellent Week 2 work. Do not aim for a heroic call. Aim for a calibrated one.
Weekly Prediction Submission Format (commit this to GitHub every Monday)
Save as prediction_YYYY-WXX_teamname.md in your evidence folder. Every field is required.
| Field | Your entry | Example |
|---|---|---|
| Team name | ________ | Team Sigma |
| Prediction week | ________ | 2026-W22 |
| S&P 500 direction | Up / Down / Flat | Up |
| S&P 500 % range | e.g. +0.5% to +1.5% | +0.4% to +1.0% |
| Confidence | Low / Medium / High | Medium |
| Nasdaq direction | Up / Down / Flat | Up |
| Russell 2000 direction | Up / Down / Flat | Flat |
| Gold direction | Up / Down / Flat | Down |
| Oil direction | Up / Down / Flat | Up |
| 10-Year Yield direction | Up / Down / Flat | Up |
| VIX direction | Up / Down / Flat | Down |
| Bitcoin direction | Up / Down / Flat | Up |
| Leading sector | sector name | Energy |
| Lagging sector | sector name | Real Estate |
| Key evidence (3 points) | brief bullets | Fed tone hawkish; Russell lagged last week; Oil rising |
| Key contradiction | 1 sentence | Almanac shows late-May bullish seasonality, contra our bearish macro read |
| Invalidation condition | 1 sentence | If CPI comes in below 3%, yields fall, which would turn our view bullish |
| LLM consensus | Agree / Disagree / Mixed | Mixed — Claude Bullish, others Neutral |