Why This Project Will Change How You See Yourself
Read This FirstWherever You Are Headed — This Project Gives You a Story
Going for an Internship?
Internship interviewers — at tech companies, banks, consultancies, startups — are looking for one thing above all else: evidence that you can think under uncertainty and communicate clearly. Most candidates talk about their GPA and their club memberships.
You walk in and say: "Every week for ten weeks I made a real prediction about the S&P 500, the Nasdaq, and the Russell 2000 using a structured evidence framework that combined seasonal data, macro indicators, chart analysis, and four AI models. I then scored my own calibration — not just whether I was right, but whether my confidence was appropriate. Here is what I learned about the difference between a confident AI answer and a carefully reasoned human judgment."
That is not a student project. That is a demonstration of analytical maturity. It is rare at any age. At twenty years old it is exceptional.
Targeting a Master's Programme?
Top master's programmes — in data science, finance, business analytics, AI, or management — ask you to write a statement of purpose. Most applicants describe their coursework and grades. A few describe a project that genuinely challenged them.
Your statement of purpose says: "As a capstone project during my undergraduate studies, I led the Technical Agent role in a ten-week Agile sprint team that built a real-time market intelligence system. Each sprint cycle began with data collection at market close, moved through three structured analytical agents, was synthesised by four AI models, and culminated in a human-reviewed prediction with a documented confidence level. We tracked calibration — not just accuracy — and used the retrospective to improve the reasoning process each week."
Programmes in finance, data science, or AI will read that and see someone who already understands human-AI collaboration, evidence-based reasoning, and iterative improvement. That is exactly what they want in their cohort.
Heading Straight into a Career?
The working world is full of people who can follow instructions. It is genuinely short of people who can look at complex, contradictory information and say clearly: "Here is what I think, here is why, here is what would change my mind, and here is my confidence level." That skill is valuable in any industry.
In a job interview you say: "I have experience working with real financial data, using multiple AI tools, and synthesising conflicting information into a structured recommendation. More importantly, I have ten weeks of documented evidence showing how I improved at it — where I was overconfident, where I was right to be uncertain, and how I changed my process based on what the data showed me."
Financial literacy is not just for finance jobs. Understanding how markets, interest rates, and economic signals interact is relevant to product management, operations, strategy, marketing, and technology. The person who reads the room — and reads the market — has a different kind of intelligence.
The Compounding Effect — Why Starting at Twenty Changes Everything
You learn to read a chart, understand why interest rates move markets, and think in probabilities rather than certainties. You develop a vocabulary — EMA (Exponential Moving Average — a weighted average that reacts faster to recent price changes), yield, regime, calibration — that most of your peers do not have.
You are making better financial decisions in your own life. You understand what happens to your savings when the Fed raises rates. You read economic news and understand what it actually means. This is the beginning of genuine financial autonomy.
The habits you built — evidence before confidence, uncertainty when signals conflict, continuous improvement through retrospectives — apply to every major decision you will make. Career moves. Business decisions. Investment choices. The discipline is the asset.
Someone who learned to think this way at twenty and kept practising has a twenty-year head start on clear reasoning under uncertainty. That is the real compound interest — not money, but the quality of judgment that money follows.
Warren Buffett started investing at eleven years old and has said his only regret was not starting earlier. He did not say that because early investing guarantees wealth. He said it because starting early means your reasoning compounds. Every mistake you analyse and improve on adds to a mental model that becomes more valuable with each decade. You are starting this now. At twenty.
Financial Literacy Is a Superpower Most People Never Acquire
Most people — including many highly educated professionals — go through their entire lives without understanding how financial markets work. They do not know what the yield curve means, why the Fed's words move stock prices, or how oil connects to inflation expectations. This is not their fault. Nobody taught them. They were never asked to engage with real data in a structured, accountable way.
- Hears "the Fed raised rates" and has no idea what it means for their savings or investments
- Makes big financial decisions based on tips from friends or confident-sounding strangers
- Confuses price going up with value being created
- Is overconfident when things are going well and panics when they are not
- Understands why the 10-year yield affects every asset class and can explain it simply to anyone
- Has a framework for evaluating any claim: what is the evidence, what contradicts it, what would change the conclusion
- Knows the difference between a high-confidence call and a coin flip, and can articulate why
- Has a documented track record of being wrong, learning from it, and improving
"I do not care whether you predict the market correctly. I care whether you learn to think."
Markets are one of the most honest feedback systems that exist. You make a call. You record it. The outcome is public, objective, and undeniable. There is no partial credit for almost being right. There is also no penalty for being wrong with good reasoning — only for being overconfident without evidence.
That feedback loop — predict, observe, score, improve — is exactly what separates professionals who grow from those who stagnate. You are practising it at twenty. The students who take this seriously and put real energy into it will look back at this trimester and recognise it as one of the formative experiences of their intellectual development.
The question is not whether this is a valuable project. It is. The question is whether you are going to treat it as one.
— Professor Dr. Tan, CP3405 Design Thinking 3
🎯 The Challenge — Specifically for You
Treat every sprint presentation as a rehearsal for explaining your work to a hiring manager. Practise saying: "I predicted X because of Y and Z. I was wrong because I underestimated W. Here is what I changed." That sentence, said confidently and clearly, is worth more than any certification.
Keep a personal reflection journal alongside your GitHub work. Write two sentences after every sprint: what surprised you this week, and what it changed in how you think. Ten weeks of genuine reflection is a personal statement that writes itself.
Make your GitHub repository public and link it in your resume. A real, documented, ten-week project with consistent commits, evidence, and retrospectives tells any employer that you are someone who finishes things, works in a team, and improves over time. That is exactly what they are trying to assess in every interview.
That is fine. The skill you are building — structured reasoning under uncertainty — is useful everywhere. The worst outcome of this project is that you spend ten weeks thinking rigorously about complex real-world problems and learn to communicate your thinking clearly. There is no profession where that is a disadvantage.
Your Week 2 Mission — What Happens This Week
26 May – 2 June 2026- Read this entire website before class
- Form your group and agree on roles (R1–R10)
- Create your GitHub repository — README + evidence folder
- Create free accounts: Claude, ChatGPT, Gemini, DeepSeek
- Bookmark the three data source URLs
- R3: Open the Almanac PDF — read May vital stats and the Memorial Day week day notes
- R4: Check CME FedWatch, TradingEconomics calendar, 10-year yield
- R5: Open ProRealTime or TradingView — study SPX, NDX, IWM charts
- R9: GitHub repo clean, README written, evidence folder structure set up
- Saturday: Pull Finviz (1W) + Yahoo Sectors (5D) — screenshot both
- Write three agent outputs — Almanac, Macro, Technical — before opening any AI
- Query all 4 LLMs with the shared prompt — save every raw response
- Complete the Multi-LLM comparison table
- R7 leads the Human Score discussion — write the override paragraph
- Commit prediction file to GitHub before Sunday night
- Submit predictions to the Prediction Board (link at bottom of this site)
- Everything is already prepared. Walk in ready to present.
- 20–25 minute presentation — every role holder speaks
- Show GitHub repo live — commits, evidence folder, release tag
- R7 presents the Human Score — the thing no AI raised
- R10 updates calibration after Friday close is recorded
The single most important thing to do before Saturday: Read the exemplary solution page at exemplary-solution.html — it shows you exactly what a complete, high-quality submission looks like for this exact week. Study every section. Your submission does not need to reach the same conclusions, but it must show the same structure.
What This Exercise Is About
You are technology students, not finance students. You do not need to become a trader. You need to become a disciplined thinker who can gather evidence, reason under uncertainty, and explain a conclusion clearly. The market is the practice arena. The real skill is the process.
Saturday over the weekend: Set the Prediction
US markets close Friday 4PM ET = Saturday over the weekend. The moment that happens, your sprint begins. Pull the data, run the agents, query all four AI models, and commit your SPX / NDX / IWM predictions to GitHub — with % range, direction, and confidence level — before Saturday ends.
Following Friday: Record Actuals
After US markets close the following Friday (Saturday over the weekend), record the actual % change for SPX, NDX, and IWM. Do not change your Saturday prediction. The GitHub commit timestamp is the immutable evidence of what you said before the week played out.
Monday Class: Present & Score
You arrive at Monday's class with your prediction already filed and your actuals already recorded. You present: what you predicted for SPX/NDX/IWM, what actually happened, your calibration score, and what you are changing. No last-minute preparation on Monday morning.
⚠️ Important framing: This is not a competition to predict correctly. It is an exercise in calibration — stating your confidence accurately. A cautious, well-reasoned "uncertain" is worth more than an overconfident wrong prediction. You are marked on reasoning quality, not market accuracy.
The Nine Assets You Track
ReferenceThese are the instruments your team makes weekly predictions about. Each one measures a different part of the global economy. Together, they tell a story about where money is moving and why.
S&P 500
A basket of the 500 largest US companies. Think Apple, Microsoft, Amazon, Google, Tesla. It is the most widely watched barometer of the US stock market. When people say "the market went up today," they almost always mean the S&P 500.
Why it matters: It is the benchmark everything else is measured against. If the S&P 500 is rising, the broad economy is usually healthy. If it falls, investors are worried.
Nasdaq 100
The 100 largest non-financial companies listed on the Nasdaq exchange. This is heavily weighted toward technology: semiconductors, cloud software, AI, biotech. It moves faster and more dramatically than the S&P 500.
Why it matters: Compare Nasdaq vs S&P 500 performance each week. If Nasdaq leads, the market is favouring growth and tech. If Nasdaq lags, investors may be rotating into safer, older industries.
Russell 2000
2,000 small US companies. Unlike the S&P 500 giants that earn globally, most Russell 2000 companies sell almost entirely within the US and borrow heavily. This makes them very sensitive to US interest rates and domestic economic conditions.
Why it matters: When Russell 2000 underperforms the S&P 500, it often signals that investors fear higher rates or a slowing US economy. It is one of the best early warning signals.
Gold
A physical metal that has been used as money and a store of value for thousands of years. Gold does not pay dividends or interest. People buy it when they are afraid: afraid of inflation, afraid of economic collapse, afraid of currency debasement.
Why it matters: Gold rising while stocks fall = fear trade. Gold rising while stocks also rise = inflation concern. Gold falling while stocks rise = confidence, risk-on mood. It is the market's fear gauge alongside VIX (CBOE Volatility Index — measures expected 30-day market volatility, known as the Fear Index).
Crude Oil (WTI (West Texas Intermediate — the North American benchmark crude oil price))
West Texas Intermediate crude oil is the North American benchmark price for a barrel of oil. Oil touches almost every product and service in the economy: transport, manufacturing, heating, plastics. A major oil price move creates ripples everywhere.
Why it matters: Rising oil = higher inflation expectations, pressure on consumer spending, boost for energy stocks. Falling oil = potential deflation, relief for consumers, pressure on energy sector. Geopolitical events (wars, sanctions) can move oil dramatically in a single day.
10-Year Treasury Yield
The interest rate the US government pays to borrow money for 10 years. This is the most important single number in global finance. Mortgages, corporate loans, and valuations of all stocks are mathematically linked to this rate.
Why it matters: Rising yield → borrowing costs go up → companies worth less → stock valuations fall → pressure on Russell 2000 and growth stocks especially. Falling yield → the opposite. Think of yield as gravity: the higher it is, the harder it is for asset prices to stay elevated.
US Treasury Bonds (TLT)
When yields rise, bond prices fall — and vice versa. Bonds and yields always move in opposite directions. The TLT ETF (Exchange-Traded Fund — a basket of securities traded like a single stock on an exchange) tracks long-dated US Treasury bonds (20+ years). When investors are scared, they often flee into bonds — "flight to safety."
Why it matters: Watch whether stocks and bonds are moving together or apart. Stocks up + bonds up = unusual, often a "Goldilocks" moment. Stocks down + bonds up = classic fear trade. Stocks down + bonds also down = something unusual (inflation shock, fiscal worry).
VIX — The Fear Index
The CBOE Volatility Index measures how much the options market expects the S&P 500 to move over the next 30 days. It is calculated from the prices of options contracts — essentially, how much investors are paying for insurance against a market drop.
Why it matters: VIX below 15 = calm. VIX 15–25 = moderate concern. VIX above 30 = fear. VIX above 40 = panic (happened during COVID crash, 2008 crisis). A rising VIX usually means falling stocks. Watch for VIX spikes as early warnings.
Bitcoin
The largest cryptocurrency by market cap. Bitcoin trades 24/7 — unlike stocks — and is highly sensitive to risk appetite, liquidity conditions, and regulatory news. In recent years it has increasingly correlated with Nasdaq during risk-off events.
Why it matters: Bitcoin often moves first. When risk appetite improves, Bitcoin can surge before stocks do. When fear hits, Bitcoin can crash faster than any stock index. It is a useful leading indicator of risk appetite, though very noisy.
How These Assets Talk to Each Other
The key cross-asset relationships to watch every week:
🔗 Yields ↑ → Stocks (especially Nasdaq & Russell) ↓ — higher borrowing costs hurt growth companies most
🔗 Gold ↑ + Stocks ↓ — fear trade, money fleeing risk
🔗 Oil ↑ → Energy sector ↑, Consumer Discretionary ↓ — oil is both a threat and a sector opportunity
🔗 VIX ↑ → S&P ↓ — almost always true in the short term
🔗 Bitcoin ↑ with Nasdaq ↑ — risk-on mood, growth appetite
🔗 Russell 2000 lagging S&P 500 — warning sign: broad rally may be narrowing
The 11 S&P 500 Sectors
ReferenceThe S&P 500 is divided into 11 sectors. Each week, watch which sectors are leading (buying) and which are lagging (selling). The pattern tells you what kind of week it was — risk-on or risk-off, growth or defensive.
Rotation pattern to memorise: When money moves from Technology → Utilities/Staples, that is defensive rotation (investors getting scared). When money moves from Staples/Utilities → Technology/Financials, that is risk-on rotation (investors getting confident). Sector leadership is one of the most reliable signals of market mood.
Your Three Bookmarks
ToolsYou do not need to visit ten websites. Bookmark these three. Together they give you every number on the weekly scorecard in under 5 minutes over the weekend from weekend.
Finviz Futures Performance
One page. All macro assets (S&P, Nasdaq, Russell, Gold, Oil, 10-Year, Bonds, Bitcoin, VIX). Switch from "1D" to "1W" with one click to see the full weekly % change. No login. No paywall.
★ Use this first over the weekendYahoo Finance Sectors
All 11 S&P sectors with weekly % change as a colour-coded heatmap. Change the time period to "5D" to see the prior week's performance. Green = up, Red = down.
Free · No login neededTradingEconomics Calendar
The week-ahead macro calendar. Shows CPI (Consumer Price Index — measures the change in prices paid by consumers), jobs data, Fed speeches, earnings dates, and every major economic event that could move markets. Feed this into your Macro Agent and your prediction reasoning.
Free · No login neededthe weekend workflow: Open Tab 1 (Finviz, set to 1W) → screenshot → open Tab 2 (Yahoo Sectors, set to 5D) → screenshot → open Tab 3 (TradingEconomics, look at the coming week) → note key events. Total time: 5 minutes. Commit both screenshots to your GitHub evidence folder labelled with the date.
Macro Agent — Your Seven Locked Tools
R4 ReferenceUse only these seven sources. If every student uses different sources, your outputs are incomparable. These seven cover everything the Macro Agent needs. They are all free, all require no login, and all update in real time. Do not supplement with social media, Reddit, paid terminals, or opinion blogs.
Shows the market's implied probability of each Fed rate outcome at every upcoming FOMC meeting. The most important single macro signal in the world.
Key insight: A shift from 96% hold to 85% hold is a big deal even if the rate does not change. The market repriced expectations — that moves assets.
The week-ahead economic calendar. Every data release — CPI, PCE, jobs, GDP, retail sales, Fed speeches — with date, time, previous value, and market consensus forecast.
Key insight: Markets move on surprises. If consensus expects CPI at 2.8% and it prints 3.2%, that is the shock — not the number itself.
One page. All macro assets — indices, bonds, dollar, oil, gold, bitcoin. Switch to 1W for weekly view. Your primary data source for recording last week's actual % changes.
Key insight: Dollar and oil moving in the same direction is unusual and often signals a geopolitical driver rather than a macro economic one.
Official US government source for the 2-year, 10-year, and 30-year Treasury yields. Updated daily. The 10-year is the most important single rate in global finance.
Key insight: 10-year at 4.60%+ = gravity on stock valuations. Every 0.25% rise in the 10-year is felt most by growth stocks and REITs.
Wire service news — confirmed facts, official statements, central bank announcements, geopolitical developments. These are primary sources. Use them for confirmed events only.
Key rule: If you cannot find it on Reuters or AP, do not include it as a fact in the Macro Agent output. Separate confirmed news from rumour.
Published every Friday, written by professional market strategists. Gives a structured summary of the week that was plus the week-ahead setup. Free, no paywall.
Key rule: This is a cross-check, not a substitute. Do not copy Schwab's view as your own analysis.
The most complete free earnings calendar. Shows every company reporting that week, consensus EPS estimates, and the "whisper number" (what traders actually expect vs what analysts publish).
Key insight: One unexpected earnings miss from a major company can drag its entire sector down even if the broad market is flat.
Macro Agent Output Template — Fill This Every Week
Every field maps directly to one of the seven tools above. No field should take more than 2 minutes to complete. Total time for the Macro Agent: 15–20 minutes.
· Current Fed rate: [X.XX–X.XX%]
· Next FOMC date: [date]. Hold probability: [XX%]. Cut probability: [XX%]. Direction vs last week: [unchanged / shifted dovish / shifted hawkish]
· 2-year yield: [X.XX%] 10-year yield: [X.XX%] 30-year yield: [X.XX%]
· Yield curve: [normal / inverted / flat]. 10-year direction this week: [rising / falling / flat]
· [Implication: rising 10-year = headwind for growth stocks / falling 10-year = relief for REITs and tech]
COMMODITIES & DOLLAR (Finviz Futures 1W):
· WTI Crude Oil: [close price], weekly change: [+/−X.X%], direction: [rising / falling / flat]
· Gold: [close price], weekly change: [+/−X.X%], direction: [rising / falling]
· DXY (Dollar): [level], direction: [strengthening / weakening]
· [Cross-asset implication: oil rising + yields rising = stagflation risk. Oil falling = inflation relief.]
WEEK-AHEAD CALENDAR (TradingEconomics):
· [Day, Date]: [Event name] — Expected: [X.X%], Previous: [X.X%] — IMPORTANCE: [High/Med/Low]
· [Day, Date]: [Event name] — Expected: [...], Previous: [...] — IMPORTANCE: [...]
· [Repeat for each major event this week. Maximum 5 entries. Rank by market-moving potential.]
KEY EARNINGS THIS WEEK (Earnings Whispers):
· [Company (TICKER)] — Day — Sector: [XLX] — What to watch: [one sentence]
· [2–3 entries maximum. Only include earnings that could move an entire sector.]
CONFIRMED NEWS EVENTS (Reuters / AP):
· [Event description] — Source: Reuters/AP — Date — Market implication: [one sentence]
· [Confirmed facts only. No rumours. No social media. If you cannot cite Reuters or AP, leave it out.]
MACRO BIAS: [Hawkish / Dovish / Neutral / Binary-risk]
PRIMARY DRIVER THIS WEEK: [The single most important macro event — in one sentence]
CONFIDENCE: [Low / Medium / High] — [Brief justification]
INVALIDATION: [What macro event would reverse your bias completely]
Sources accessed: [date]. All data from the seven approved tools only.
The three-level distinction your Macro Agent must always make:
Level 1 — Confirmed fact: "The Fed held rates at 3.50–3.75% on April 29." (Reuters confirmed)
Level 2 — Market expectation: "FedWatch shows 96% probability of hold at June 17 meeting." (CME data)
Level 3 — Your interpretation: "Given Warsh's hawkish track record, we believe tone risks are skewed to the upside for yields." (your analysis)
Label all three differently in your output. Never present an interpretation as a fact.
Almanac Reference Card — Annual Seasonal Patterns
R3 Reference · 2026Use the PDF for week-specific day notes. Use this card for monthly stats and sector seasonality. The Almanac PDF is your authoritative source — open it every weekend and look up the current month's daily calendar notes. This reference card gives you the key statistics without having to re-read the book every week. It does not replace the PDF. The PDF contains day-specific notes (e.g. "Memorial Day week Dow down 17 of last 29") that are not reproduced here.
Monthly Vital Statistics — S&P 500 (1950–2025)
Every sprint, check which month you are in and read the corresponding row. The Midterm Year column is especially important in 2026 — it overrides the normal average for the year.
Sector Seasonality Quick Reference (Active Signals for May–July 2026)
The Almanac PDF is still your authoritative source for day-specific notes. This reference card gives you the monthly stats and sector seasonality. For notes like "Day after Memorial Day Dow down 8 of last 10" or "First Trading Day in May S&P up 19 of last 28" — those are in the daily calendar pages of the PDF and are not reproduced here. Open the PDF every weekend and read the calendar entries for the coming week.
The Almanac Agent — How to Use It
Step-by-Step GuideWhat is the Stock Trader's Almanac? Published every year since 1968, it is a data book of historical stock market patterns. It tells you what the market has done in each month, each week, and around specific dates over the past 50–75 years. It does not predict the future — it shows you what has historically happened and how often. Your job as the Almanac Agent is to read the relevant patterns for the coming week and turn them into a structured hypothesis — not a trading instruction.
Critical: 2026 is a Midterm Election Year — This Changes Everything
Before reading any monthly pattern, you must understand the 4-Year Presidential Cycle. It is the most important macro seasonal context in the entire Almanac.
The Almanac calls Q2–Q3 of every midterm year the "Weak Spot." Historical averages: Dow −2.0%, S&P −2.5%, Nasdaq −6.6% across these two quarters. 10 of the last 16 bear markets bottomed in a midterm year. We are now in this window.
In midterm election years specifically, May ranks #8 Dow, #9 S&P, #6 Nasdaq — below average performance. Midterm year May average: Dow −0.6%, S&P −0.7%, Nasdaq −0.8%. This is a meaningful headwind vs. the normal May average.
The good news: after the midterm weakness, Q4 2026 historically begins the "Sweet Spot" — the best period of the 4-year cycle. The Almanac's 2026 outlook: net year gain of 4–8% but front-loaded with pain in Q2–Q3.
How this affects your weekly Almanac output: When you read a bullish seasonal pattern for a specific week, you must weigh it against the broader midterm year headwind. A bullish Memorial Day week pattern is less convincing in a midterm year with elevated yields. Always state both the pattern and the cycle context.
May Vital Statistics — What the Almanac Actually Says
This is the raw data from the Almanac (page 65). Here is how to read every row.
| Metric | DJIA | S&P 500 | Nasdaq | Russell 1K | Russell 2K | What it means for your output |
|---|---|---|---|---|---|---|
| Rank (best month) | #9 | #8 | #5 | #6 | #4 | May is mid-tier to below average for Dow/S&P, better for Nasdaq and small caps. Not a strong seasonal month. |
| Up / Down (75 yrs) | 41 / 34 | 46 / 29 | 33 / 21 | 32 / 14 | 29 / 17 | S&P up 61% of Mays. Slight bullish lean historically but not dominant. Russell 2K up 63% — small caps tend to do better in May. |
| Avg % Change | −0.02% | +0.3% | +1.1% | +0.9% | +1.3% | Nasdaq and small caps historically do much better in May than the Dow. Tech/growth seasonal tilt in May. |
| Midterm Yr Avg % | −0.6% | −0.7% | −0.8% | +0.1% | −1.0% | ⚠️ This row matters most for 2026. In midterm years specifically, May is negative for all indices except Russell 1K. This is your key 2026 caveat. |
Key Week-Level Patterns for Late May (Memorial Day Week)
Dow down 17 of the last 29 Memorial Day weeks. This is a statistically meaningful bearish tendency. Despite a bullish run of 12 straight years (1984–1995), the recent record is poor.
Dow up 23 of last 39 — marginally bullish. But down 8 of the last 10 years. The recent trend has turned bearish for the Tuesday re-open after the long weekend.
Dow mixed, up 13 down 13, average −0.05%. Light volume, traders leaving early. Not a directional signal — just expect thin, directionless trading.
S&P up 30 of last 45, avg +0.40%. This is actually one of the stronger week-level patterns in May. We are currently in this window after the May options expiration.
Almanac note directly relevant to Week 2: "Better to reposition in May than to sell in May and go away." The Almanac is nuanced — it does not say May is catastrophic, just below average. The real "Worst Six Months" warning applies to the full May–October period, not necessarily every single week.
Sector Seasonality — What the Almanac Says About Sectors in May–June
The Almanac has a dedicated Sector Seasonality table (page 94). These are average returns when you enter/exit sectors at historically optimal times. Here are the patterns most relevant to late May / early June:
| Sector | Seasonal Type | Start | End | 25-yr Avg Return | What this means for your prediction |
|---|---|---|---|---|---|
| Banking (BKX) | SHORT (bearish) | May (early) | July (early) | −6.3% | Financials historically weak from early May through early July. Almanac says avoid/underweight banks in this window. |
| Gold & Silver (XAU) | SHORT (bearish) | May (mid) | June (late) | −6.8% | Gold seasonally weak mid-May through late June. Note: this is the seasonal pattern — actual gold is near all-time highs, which is a contradiction to document. |
| Materials (S5MATR) | SHORT (bearish) | May (mid) | Oct (mid) | −5.1% | Materials sector historically one of the weakest in the May–October worst six months period. |
| Oil (XOI) | SHORT (bearish) | June (early) | Aug (late) | −5.7% | Oil sector seasonal weakness begins in early June. Relevant for Week 3+ predictions. |
| Info Tech (S5INFT) | LONG (bullish) | March (mid) | July (mid) | +10.9% | Technology still in its seasonal long window through July. This is a bullish sector signal for tech/Nasdaq through June. |
| Utilities (UTY) | LONG (bullish) | March (mid) | Oct (early) | +9.3% | Utilities in a long seasonal window — but also being crushed by rising yields. This is a contradiction to document: seasonal says bullish, rates say bearish. |
| Healthcare (S5HLTH) | LONG (bullish) | Oct (early) | May (early) | +8.7% | Healthcare seasonal window technically ends in early May. You are at the tail end of a bullish period — note this in your output. |
Key sector insight from Almanac for late May / early June 2026: The pattern strongly supports avoiding Banking and Materials, and still holding Technology (seasonal long window through July). The Gold/Silver seasonal short is notable given gold is near all-time highs — this contradiction between the seasonal pattern and actual price action is exactly the kind of nuance that belongs in your Almanac Agent output.
How to Write Your Almanac Agent Output — Step by Step
Step 1 — Identify the relevant pages
Over the weekend (after US Friday market close), look up three things in the Almanac: (1) the Monthly Almanac page for the current month, (2) the relevant day-specific notes for the coming week, (3) the Sector Seasonality table on page 94. You do not need to read the whole book — just these targeted lookups.
Step 2 — Record the four key facts
| Fact to record | Where to find it | Example for Week 2 |
|---|---|---|
| Monthly rank and average % change | May Vital Statistics table, first two rows | May ranks #8 S&P, avg +0.3% — below average month. Midterm year avg −0.7%. |
| Up/Down record for the month | May Vital Statistics, Up/Down row | S&P up 46, down 29 of 75 years (61% up). Slight bullish lean. |
| Specific week/day pattern | Calendar day notes in the Almanac daily pages | Memorial Day week: Dow down 17 of last 29. Day after: down 8 of last 10. |
| Relevant sector seasonalities | Sector Seasonality table, page 94 | Banking seasonal short began early May. Tech still in seasonal long window through July. |
Step 3 — Assess pattern strength
Not all patterns are equal. Use this simple framework:
Step 4 — Write the structured Almanac Agent output
This is what you paste into the synthesis prompt. Use this exact structure every week:
CYCLE CONTEXT: Midterm election year (2026). Q2–Q3 is the "Weak Spot" of the 4-Year Cycle. Almanac forecasts tougher trading through Q3 before a Q4 "Sweet Spot" rally.
MONTHLY STATS:
- S&P 500: ranks #8 of 12 months. Up 61% of the time. Avg +0.3% normally.
- Midterm year May avg: −0.7% for S&P. This is the active context for 2026.
- Nasdaq historically stronger in May: avg +1.1%, ranks #5.
- Russell 2000: avg +1.3%, ranks #4 — best of the indices in normal Mays.
SPECIFIC WEEK PATTERN (Memorial Day Week, 26–30 May):
- Memorial Day week: Dow down 17 of last 29. Bearish lean. Pattern strength: MODERATE.
- Day after Memorial Day (Tue 27 May): Dow down 8 of last 10. Recent trend bearish.
- Week after options expiration: S&P up 30 of 45, avg +0.40%. Bullish lean. Pattern strength: MODERATE.
- These two patterns contradict each other. Net: mixed / slight bearish lean.
SECTOR SIGNALS:
- Banking: seasonal SHORT window (May–July). Avoid Financials seasonally.
- Technology: still in seasonal LONG window (March–July). Supports tech/Nasdaq.
- Gold/Silver: seasonal SHORT begins mid-May. Contradicts current elevated gold price.
- Materials: seasonal SHORT (May–October). Consistent with last week's poor sector performance.
ALMANAC SEASONAL BIAS: Cautiously bearish-neutral. Midterm year cycle and Memorial Day week pattern are headwinds. Week-after-expiration pattern is a mild tailwind. Technology seasonal long is the one clear positive.
PATTERN CONFIDENCE: LOW–MEDIUM. Two contradicting week-level patterns. Midterm year context dominates.
ALMANAC THESIS: "Seasonality suggests cautious/bearish-neutral for late May in a midterm year. Technology is the seasonal bright spot. Banking and Materials face seasonal headwinds. The Memorial Day week bearish tendency adds to the macro pressure from elevated yields. However, the week-after-options-expiration bullish pattern provides some offset. Confidence is low given conflicting signals."
Source: Stock Trader's Almanac 2026, pp. 65–66 (May Vital Statistics), p. 94 (Sector Seasonality), pp. 10–11 (2026 Outlook). Accessed: 24 May 2026.
The most important rule for the Almanac Agent: Never write "the Almanac says the market will go up this week." Always write "the seasonal pattern suggests X, with Y confidence, because Z — and this conflicts/aligns with the macro/technical evidence as follows." The Almanac gives you a probability context, not a certainty. Your job is to translate a historical pattern into a structured hypothesis with an honest confidence level.
📋 Quick Reference — Where to Look in the Almanac Each Week
| What you need | Almanac page / section | What to read |
|---|---|---|
| Monthly overview and key bullet points | Monthly Almanac page (e.g. p.65 for May) | Read the bullet points at the top. These are the key historical footnotes. Copy the most relevant 2–3 into your output. |
| Monthly vital statistics table | Same page, bottom half | Read Rank, Up/Down count, Avg % Change, and Midterm Year Avg. Record all five indices. |
| Specific day/week notes | Daily calendar pages for that week | Look for any bold italicised notes in the calendar entries, e.g. "Memorial Day Week Dow Down 17 of Last 29." |
| Sector seasonality | p. 94 (Sector Seasonality table) | Look for any sector whose seasonal period starts or ends in the current month. Record Long/Short, start, end, and avg return. |
| 4-Year Cycle context | pp. 10–11 (2026 Outlook) | Read once at the start of the trimester. Reference it every week as standing context. Do not re-read every week — just paste the key sentence into your output. |
The Technical Agent — Reading Charts
Education + How-ToYou are technology students — charts are just data visualisation. Technical analysis is simply the practice of reading a price chart to understand whether buyers or sellers are currently in control, and where the price is likely to find support or resistance. You do not need to be a trader to do this. You need to observe, measure, and report. This section teaches you the four tools you will use every week: the 8 EMA, the 21 EMA, trendlines, and support/resistance levels.
What You Are Actually Looking At — Reading a Price Chart
Time. Left is older data, right is the most recent. Each bar or candle = one day (on a daily chart) or one week (on a weekly chart). You will use daily charts for weekly predictions.
Price. Higher = more expensive. The current price is at the far right edge of the chart. The chart history shows you whether price has been trending up, down, or sideways.
Green (or white) = price closed higher than it opened that day. Red (or black) = price closed lower. A series of green candles = buyers in control. A series of red candles = sellers in control.
The bars at the bottom of most charts show trading volume — how many shares traded that day. High volume on an up day = conviction. High volume on a down day = selling pressure. Low volume = indecision.
Where to get charts: Use ProRealTime (already in your toolkit) or TradingView.com (free, no login for basic charts). Search for SPX (S&P 500), NDX (Nasdaq 100), RUT (Russell 2000). Set the chart to Daily timeframe and look at 3–6 months of history.
The 8 EMA — Your Short-Term Momentum Signal
What is an EMA?
EMA stands for Exponential Moving Average. It calculates the average closing price over the last N days, but gives more weight to the most recent days. This makes it react faster to new price moves than a simple average.
The 8 EMA is the average of the last 8 trading days, weighted toward the most recent. On a chart it appears as a smooth curved line that follows price closely.
How to add it in ProRealTime / TradingView
- Open your chart for SPX (S&P 500)
- Click "Indicators" or the + icon
- Search for "EMA" or "Exponential Moving Average"
- Set the period to 8. Set colour to orange.
- Repeat and add a second EMA with period 21. Set colour to blue.
- Price is above the 8 EMA → buyers in control
- Price pulls back to 8 EMA and bounces off it → healthy trend
- 8 EMA is sloping upward → momentum building
- Price breaks below the 8 EMA → short-term momentum failing
- 8 EMA is flattening → momentum stalling
- Price closes below 8 EMA for multiple days → watch the 21 EMA next
The 21 EMA — Your Medium-Term Trend Confirmation
The 21 EMA is the average of the last 21 trading days (approximately one month of trading). It moves more slowly than the 8 EMA. Think of it as the difference between your daily mood (8 EMA) and your general personality over the last month (21 EMA).
8 EMA above 21 EMA, both sloping up, price above both. Full momentum. This is where you want to see S&P 500 and Nasdaq during a bull run.
The two EMAs are converging or crossing. This is a transition zone — the trend may be changing. Do not make a high-confidence call here.
8 EMA below 21 EMA, both sloping down. Sellers in control. Price is likely to face resistance at both EMAs on any bounce.
8 EMA crossing back above 21 EMA from below. Potential trend reversal. Wait for confirmation — one cross does not confirm a new trend.
Trendlines — Drawing the Channel Price is Moving In
A trendline is a straight line you draw on the chart connecting a series of highs or lows. It shows you the direction and angle of the trend. Think of it as drawing the "floor" (uptrend line) or "ceiling" (downtrend line) of price movement.
How to draw an uptrend line
- Identify at least two low points (price bounced up from these)
- Connect them with a straight line, extending it to the right
- If a third low touches the line: the trendline is confirmed
- If price closes below this line: potential trend break — flag it
What to report in your Technical Agent output
- Is price in an uptrend, downtrend, or sideways channel?
- Where is the trendline currently? (e.g. "support at ~7,350")
- Is price approaching, bouncing off, or breaking the trendline?
- Has there been a confirmed break? (needs a closing price, not just an intraday dip)
Support & Resistance — The Price Levels That Matter
Support is a price level where buying has historically been strong enough to stop price from falling further. Resistance is a price level where selling has historically been strong enough to stop price from rising further. These levels matter because many market participants are watching the same levels simultaneously.
- Look for price levels where the chart has bounced multiple times
- Round numbers (e.g. 7,000; 7,500) often act as psychological levels
- Recent highs and recent lows are the most important levels to track
When price breaks through a resistance level, that level often flips and becomes the new support. When price breaks below a support level, that level often flips and becomes the new resistance. This is one of the most reliable patterns in charts.
State 2–3 specific price levels for S&P 500 and Nasdaq. For each level say: is it support or resistance? How many times has price touched it? Is price currently approaching it, above it, or below it?
Technical Agent Output Template — What to Write Every Week
This is the structured output you paste into the LLM synthesis prompt. Use this format every single week — consistency makes comparison across weeks meaningful.
LAST CLOSE: 7,473 (Fri 23 May 2026)
8 EMA vs PRICE:
- Price is ABOVE the 8 EMA. Momentum intact short-term.
- 8 EMA estimated at ~7,420. Price is ~53 points above it — healthy gap.
8 EMA vs 21 EMA:
- 8 EMA is ABOVE 21 EMA. Trend structure bullish.
- 21 EMA estimated at ~7,380. Gap between 8 and 21 EMA = ~40 pts, not compressing.
- EMA condition: Zone 1 (Bullish) — both rising, price above both.
TRENDLINE:
- Uptrend line drawn from March 2026 lows, connecting lows at ~7,050, ~7,200, ~7,350.
- Current trendline support: approximately 7,330–7,360 on the coming week.
- Price is above the trendline. No break detected.
KEY LEVELS:
- Resistance 1: 7,500 (round number, prior intraweek high). Not yet broken.
- Resistance 2: 7,550 (prior weekly closing high from April). Would be new high.
- Support 1: 7,350 (confluence of 21 EMA + trendline). Key level to hold.
- Support 2: 7,200 (major prior breakout level, would flip back to support).
BREADTH NOTE:
- Only 57% of S&P 500 stocks above their 200-day MA. Narrow rally — caution.
- Russell 2000 outperformed this week (+2.7%). Broadening signal, but needs follow-through.
TECHNICAL BIAS: Neutral-Bullish, with caution on breadth.
CONFIDENCE: Medium. Structure intact but narrow breadth and proximity to resistance reduce conviction.
INVALIDATION: Close below 7,350 (trendline + 21 EMA confluence). That would shift bias to Bearish.
WATCH THIS WEEK: Can price break and hold above 7,500? Does Russell 2000 confirm above 2,900?
The one rule for Technical Agent: Never say "the chart looks good." State specific levels, specific EMA positions, and a specific invalidation condition. "Bullish while above 7,350" is a complete technical statement. "Looks bullish" tells the synthesis agent nothing.
The Human Score — Where Your Team's Thinking Matters Most
THE DIFFERENTIATORWhy AI Cannot Replace Your Judgment
- Averages patterns from past data it was trained on
- Produces plausible-sounding text based on the prompt it received
- Has a knowledge cutoff — it cannot know what happened last week unless you tell it
- Tends toward the "safe" answer — moderate confidence, neutral to slight direction
- Weigh a specific geopolitical event (e.g. Iran deal) that happened this week
- Notice that the chart pattern is unusually clear or unusually ambiguous right now
- Decide that one piece of evidence is more important than others in this specific context
- Disagree with all four AIs and explain precisely why — and be right
- All four models may miss a regime change because it does not match past patterns
- All four models may anchor on the most recent data in your prompt
- All four models may be confidently wrong in the same direction
- The team that spotted the divergence and overrode the AI consensus — and was right — learns the most
The Human Score — A Five-Dimension Judgment Framework
After completing the multi-LLM comparison table, your team independently scores each dimension on a scale of −2 to +2. These scores reflect your human judgment — informed by the AI outputs but not determined by them. Then you write a one-sentence justification for any score where you differ significantly from the AI consensus.
| Dimension | −2 | −1 | 0 | +1 | +2 | Your Score | AI Consensus | Difference? |
|---|---|---|---|---|---|---|---|---|
| Macro / News Weight How strongly does the macro environment drive direction this week? |
Strongly bearish macro | Mildly bearish | Neutral / mixed | Mildly bullish | Strongly bullish macro | ___ | ___ | ___ |
| Technical Structure What does the chart tell you independent of any AI? |
Broken down, below both EMAs | Weak, compressing | Mixed signals | Above both EMAs, intact | Clear uptrend, all signals aligned | ___ | ___ | ___ |
| Almanac Seasonal Weight How much should seasonality influence this week's call? |
Strong bearish seasonal | Mild bearish seasonal | Weak or conflicting pattern | Mild bullish seasonal | Strong bullish seasonal | ___ | ___ | ___ |
| AI Model Agreement Quality How much do you trust the AI consensus this week? |
All models agree but their evidence seems wrong to you | Models agree but you are sceptical | Models split, you cannot break the tie | Models agree and you agree | Models agree, evidence very clear | ___ | ___ | ___ |
| Wild Card / Human Observation Is there something you noticed that no AI picked up? |
Strong bearish factor AI missed | Minor bearish factor AI missed | Nothing extra to add | Minor bullish factor AI missed | Strong bullish factor AI missed | ___ | ___ | ___ |
+2 to +5 → Team leans Neutral-Bullish
−1 to +1 → Team calls NEUTRAL / UNCERTAIN
−5 to −2 → Team leans Neutral-Bearish
−10 to −6 → Team calls BEARISH override
If your Human Score points in a different direction than the AI consensus, you must write one paragraph explaining: what you saw that the AI did not weight correctly, and why you are making a different call. This paragraph is what separates your team's analysis from every other team's.
Worked Example — Human Score for Week 2 (26 May 2026)
The AI consensus (3 of 4 models) said Neutral-Bullish, S&P +0.3% to +0.9%. Here is how one team applied human judgment to arrive at a different final call.
| Dimension | AI Said | Team Score | Team Reasoning |
|---|---|---|---|
| Macro / News Weight | +1 (mildly bullish — oil drop) | −1 | The AI focused on the oil price drop as good news. Our team noted that the Iran deal is NOT confirmed — if it falls through, oil spikes $6+ in a single day. That binary risk means the macro environment is more fragile than the AI's +1 implies. We scored it −1 for unresolved binary risk. |
| Technical Structure | +1 (price above EMAs) | +1 | Agreed. EMAs intact, trendline holding. No disagreement here. But we noted the 57% breadth figure — we flagged this as a warning, consistent with DeepSeek's concern, which the other three models underweighted. |
| Almanac Seasonal Weight | 0 (mixed signals) | −1 | The AI called it mixed/neutral. But we specifically looked at the midterm year row in the May Vital Statistics: −0.7% average for S&P in midterm Mays. Memorial Day week down 17 of last 29. The AI did not give the midterm context enough weight. We scored it −1. |
| AI Model Agreement Quality | +1 (3 of 4 agree) | 0 | 3 of 4 models agreed but their % estimates ranged from −0.5% to +1.0% — a 1.5 percentage point spread. That is wide. Agreement on direction is not the same as agreement on magnitude. We reduced this to 0 rather than +1. |
| Wild Card / Human Observation | 0 (nothing flagged) | −1 | This is the key human observation no AI raised: The Friday PCE (Personal Consumption Expenditures — the Fed's preferred inflation measure) print releases into a short 4-day holiday week with lower volume. Low-volume markets can exaggerate moves in both directions. A hot PCE in a thin market could cause an outsized selloff. None of the four AIs mentioned the volume-thinness amplification effect. We scored it −1. |
HUMAN SCORE TOTAL: −2 (Macro −1 + Technical +1 + Almanac −1 + AI Quality 0 + Wild Card −1)
HUMAN CALL: Neutral-to-Cautious. We are slightly more cautious than the AI consensus of Neutral-Bullish.
OVERRIDE PARAGRAPH: The AI models collectively underweighted three factors our team considers significant this week: (1) the Iran deal remains unconfirmed, making the oil-driven macro relief fragile and binary; (2) the midterm year seasonal context in the Almanac specifically shows negative May averages in years like this one; (3) the PCE data releases on Friday into a shortened, lower-volume holiday week — a condition that historically amplifies price moves. For these reasons, our team adjusts the final regime call from Neutral-Bullish (AI consensus) to Neutral-Cautious, and reduces our S&P 500 % range to −0.2% to +0.6% vs. the AI consensus of +0.3% to +0.9%.
Note: This override may be wrong. If PCE comes in cool and Iran confirms a deal, the AI consensus will have been right. We document our reasoning so we can learn from the outcome either way.
Why this team scores higher than a team that just copies the AI output: They showed independent reasoning on three specific dimensions. They identified a factor (low-volume holiday week amplification of PCE) that none of the four AI models raised. They quantified their disagreement (+/−) and explained it. They adjusted the final prediction specifically — not just the words, but the actual % range. And they acknowledged they might be wrong while explaining why they made the call anyway. That is what thinking looks like.
What happens when your human override is wrong? You still score well — because the scoring system rewards calibration and reasoning quality, not accuracy. A team that said "we are cautious because of X, Y, Z" and was wrong about direction but right to be cautious (the week was flat) scores well. A team that confidently said "bullish" with no caveats and was wrong scores poorly. Being wrong with good reasoning is better than being right by accident.
🧠 Human Score Checklist — Over the weekend (after US Friday market close) Before Submitting
Scrum Roles — Ten Students, Ten Responsibilities
Agile StructureEach group has approximately 10 students. Every student has a named role. No role is ownerless. Your role determines what you produce, what you present on Monday, and what evidence you commit to GitHub. Roles are fixed for the trimester but responsibilities are shared — everyone understands every agent.
The Ten Roles — Who Does What
| # | Role | Scrum Category | Weekly Deliverable | Presents On Monday | GitHub Evidence |
|---|---|---|---|---|---|
| R1 | Product Owner | Scrum Core | Sprint goal statement. Acceptance criteria for the week's prediction brief. Backlog priority decisions. | Opens the presentation. States the sprint goal and what "done" means this week. | sprint_goal_WXX.md acceptance_criteria.md |
| R2 | Scrum Master | Scrum Core | Facilitated stand-up notes (3× per week). Impediment log. Sprint retrospective summary. | Presents retrospective: what worked, what failed, what changes next sprint. | standup_WXX.md retrospective_WXX.md |
| R3 | Almanac Agent Lead | Agent | Completed Almanac Agent output using the template on this site. Monthly stats, week pattern, sector seasonality, cycle context, bias statement. | Presents Almanac output. Explains which seasonal pattern was most relevant and why. | almanac_agent_WXX.md |
| R4 | Macro / News Agent Lead | Agent | FedWatch direction, rates, dollar, oil, economic calendar summary, key news events ranked by market impact. | Presents macro output. States the single most important macro driver for the coming week. | macro_agent_WXX.md |
| R5 | Technical Agent Lead | Agent | Chart analysis for S&P 500, Nasdaq, Russell 2000. 8/21 EMA condition, trendline status, key support/resistance levels, technical bias. | Presents charts live. Shows EMA positions, trendline, and key levels with annotations. | technical_agent_WXX.md charts/ folder with screenshots |
| R6 | LLM Synthesis Operator | Synthesis | All four LLM responses queried with identical prompt. Comparison table completed. Raw responses stored in GitHub. | Presents the comparison table. Highlights where models agreed and where they diverged. | synthesis_claude_WXX.txt synthesis_chatgpt_WXX.txt synthesis_gemini_WXX.txt synthesis_deepseek_WXX.txt llm_comparison_WXX.md |
| R7 | Human Score Analyst | Synthesis | Five-dimension Human Score table. Override paragraph where team judgment differs from AI consensus. Final adjusted prediction with confidence band. | Presents the Human Score and the team's final call. Must explain any override of the AI consensus. This is the most important individual presentation slot. | human_score_WXX.md |
| R8 | Data & Evidence Lead | Quality | All data sourced and cited with access date. Screenshots of Finviz and Yahoo Finance taken Monday and Friday. Evidence folder organised and complete. | Presents last week's actuals vs. this week's predictions. Shows the data trail clearly. | actuals_WXX.md evidence/ folder finviz_1W_WXX.png yahoo_sectors_5D_WXX.png |
| R9 | GitHub & Integration Lead | Quality | All files committed. Branches used for draft work. Pull request merged by Saturday evening SGT — filed Saturday. README (a text file in a GitHub repository that explains the project, how to use it, and its current status) updated. Release tag created. | Shows GitHub commit history live. Confirms all evidence is in the repository and accessible. | README.md All commits + PRs Release tag vWXX |
| R10 | QA (Quality Assurance — the process of checking that outputs meet defined standards) & Learning Log Lead | Quality | Calibration score for last week calculated. LLM horse race updated. Learning log entry: what the team believed, what happened, what changes next week. | Closes the presentation. Announces calibration score, LLM horse race standing, and one thing the team will do differently next sprint. | calibration_log.md llm_horserace.md learning_log_WXX.md |
Role Deep-Dives — What Each Person Actually Does
Product Owner
You define what "done" looks like for each sprint. Before Monday class you write a one-paragraph sprint goal: what is the team trying to learn this week? What question is the prediction trying to answer? You also maintain the backlog — a list of things the team could improve — and you prioritise it.
Key question to answer every sprint: "What is the most important thing we can improve in our prediction workflow this week?"
Scrum Master
You run the stand-ups (done / doing / blocked — maximum 10 minutes each). You remove blockers — if someone cannot get data or cannot access a tool, you solve it. You do not do their work, you unblock it. You write the retrospective: what went well, what went badly, what changes next sprint.
Critical rule: Stand-up is about blockers, not status reports. If nobody is blocked, it should take 3 minutes.
Almanac Agent Lead
Every Saturday from weekend you open the Almanac, look up the current month's vital statistics, read the day-specific notes for the coming week, and check the sector seasonality table. You write the structured Almanac Agent output using the template on this site. You do not just copy the Almanac — you interpret it in context of the current macro environment.
Key output sentence: "Seasonality suggests _____, with _____ confidence, because _____, but this conflicts/aligns with current conditions because _____."
Macro / News Agent Lead
Over the weekend (after US Friday market close) you check CME FedWatch, the 10-year yield level, DXY (US Dollar Index — measures the dollar against a basket of major currencies), oil price, and the week-ahead economic calendar on TradingEconomics. You identify the single most market-moving event of the coming week and explain whether it is likely to be a bullish, bearish, or binary-risk catalyst. You rank your evidence by confidence.
Key discipline: Separate confirmed facts (rate is 3.75%) from expectations (market prices in 96% hold) from opinions (analysts think…). Label all three differently.
Technical Agent Lead
Over the weekend (after US Friday market close) you open ProRealTime or TradingView, pull up S&P 500, Nasdaq, and Russell 2000 daily charts, and read the 8/21 EMA condition, trendline status, and key support/resistance levels. You annotate screenshots and commit them to GitHub. You use the Technical Agent output template on this site.
Non-negotiable: State a specific invalidation level every week. "Bullish while above 7,350" is complete. "Looks bullish" is not acceptable.
LLM Synthesis Operator
You run the synthesis workflow. Once R3, R4, and R5 have their outputs ready, you paste them into the shared prompt template and query all four AI models (Claude, ChatGPT, Gemini, DeepSeek). You save every raw response as a text file in GitHub. You complete the comparison table. You hand the table to R7 before they write the Human Score.
Quality check: Were the prompts identical for all four models? If not, the comparison is invalid. Do not change a word between models.
Human Score Analyst
This is the most intellectually demanding role. After seeing the LLM comparison table, you lead the team discussion on the five Human Score dimensions. You write the override paragraph if your team's judgment differs from the AI consensus. You own the final prediction number. Your job is to think — not to summarise what the AI said.
The grade question: "What did our team see that none of the four AIs raised?" If you cannot answer this, your Human Score is incomplete.
Data & Evidence Lead
You take the weekend Finviz (1W) and Yahoo Finance Sectors (5D) screenshots the moment markets open. You record the Friday close actuals. You maintain the evidence folder in GitHub with consistent file naming. You are the team's data quality officer — if a number cannot be traced to a source with a date, it does not go in the brief.
File naming discipline: Every evidence file includes the week number and date. Example: finviz_1W_2026-W22_Mon.png
GitHub & Integration Lead
You own the repository. You create branches for draft work (e.g. sprint/W22-technical), review pull requests before merging, keep the README current, and create a release tag (vW22) over the weekend before class. If the repo is messy, your score suffers. If someone cannot find the evidence, that is your problem to fix.
Before over the weekend class: Is the README up to date? Are all files merged to main? Is the release tag created? If not, fix it before walking into class.
QA & Learning Log Lead
Over the weekend (after US Friday market close) you score last week's prediction using the calibration scoring table on this site. You update the LLM horse race tracker (which model was closest to the actual S&P move?). You write the learning log entry: what did the team predict, what actually happened, what was confusing, and what changes next sprint.
The most important line in the learning log: "We were wrong about _____ because we underestimated _____. Next sprint we will _____."
Practical A (3 Groups) and Practical B (2 Groups) — Same Standard, Same Expectations
Three groups present back-to-back. Each group has a 20–25 minute slot. With three groups that is 60–75 minutes of presentations. The remaining 45–60 minutes is used for cross-group discussion, sprint planning, GitHub work, and agent preparation. Every role holder speaks — no silent passengers.
Two groups present. Each group has a 20–25 minute slot. With two groups that is 40–50 minutes of presentations. The remaining 70–80 minutes allows for deep cross-group discussion — groups interrogate each other's SPX/NDX/IWM predictions, Human Score reasoning, and what the AI models got right or wrong.
Cross-group challenge rule: After each presentation, any student from another group can ask one question. The question must be about the reasoning, not the outcome. "Why did you score Technical +1 when breadth was only 57%?" is a good question. "Were you right?" is not — you find that out on Friday.
The 20–25 Minute Sprint Presentation
Over the weekend (after US Friday market close)This is not a slide deck presentation. You present live evidence — your GitHub repository, your charts, your comparison table, your data. You can use slides to organise your points but the primary artefact is what is in GitHub. If it is not committed, it does not exist.
The 22-Minute Presentation — Slot by Slot (+ 3 min buffer/questions)
State this sprint's goal in one sentence. Then immediately state last week's calibration score (R10 hands you the number). Example: "Our goal this week is to improve our macro agent's treatment of binary risk events. Last week we scored +4 on calibration."
Show the actuals table from Finviz. What did the team predict last week? What actually happened? Which assets were you right on? Which were you wrong on? State the direction accuracy (not %, just up/down/flat) for all 9 assets.
Read out your Almanac Agent output from GitHub. State: current month rank, the most relevant week pattern, and the midterm year cycle context. State your seasonal bias and confidence. Maximum 3 bullet points. No unnecessary detail.
State: current Fed rate, FedWatch probability, 10-year yield level, oil direction, and the single most important calendar event this week. Explain in one sentence why that event matters. State your macro bias (hawkish/dovish/neutral/binary).
Show the annotated S&P 500 chart live on screen. Point to: the 8 EMA, the 21 EMA, the trendline, and the two key levels. State the EMA zone (1–4). State the technical bias and the specific invalidation level. This is a live chart walk — not a slide. You must be able to point at the screen.
Show the comparison table. State each model's regime call in one word. Identify the point of maximum agreement (all four say X) and the point of maximum divergence (models split). Do not read out full AI responses — just the comparison table. State which model's reasoning your team found most credible, and why.
Show the Human Score table. State your total score and what it means. Then — and this is the most important part — explain the Wild Card: the one thing your team observed that none of the four AIs raised. State whether your final call differs from the AI consensus. If it does, explain precisely why. If it does not, explain why you agreed. Your team's final prediction % range goes here.
This is where every team will be different. This is what earns your grade. If your Human Score section sounds like a summary of the AI outputs, you have not done this correctly.
Open GitHub live. Show: this week's commits, the evidence folder, the release tag, and the README. This takes 60 seconds. If the repo is clean and organised, it takes 30 seconds. If it is messy, that is visible to everyone in the room.
State last week's calibration score. Update the class on the LLM horse race (which model has been most accurate across all sprints so far). State one specific change the team commits to making in next week's workflow. Close with: "Our prediction for this week is filed in GitHub as [filename]."
R2 — Scrum Master note: After the presentation, during the open discussion period, the Scrum Master facilitates a 2-minute retrospective check with the class: "What question does our prediction leave unanswered?" This surfaces blind spots before Friday.
Presentation Rules — Non-Negotiable
- Read an AI output verbatim as your own analysis
- Present something that is not committed to GitHub
- Say "the market might go up or down" without a stated confidence
- Have only one or two students speak while others stand silently
- Present without showing the actual chart (R5)
- Every role holder speaks during their designated slot
- State a specific prediction with a % range and confidence level
- State a specific invalidation condition
- Show live GitHub as evidence of the week's work
- Finish within the 22-minute window (3 minutes for questions from other groups)
- R7 identifies a genuine insight the AI models did not raise
- The team's final prediction differs from the AI consensus with clear justification
- The retrospective shows the team actually changed something from last sprint
- Calibration improves week-on-week (team gets better at knowing when to be confident)
- GitHub history shows a consistent, disciplined weekly cadence
The Weekly Sprint Rhythm
Saturday → Friday → Monday⚡ The sprint starts on Saturday at weekend — not Monday. US markets close at 4:00 PM Eastern Time on Friday. That is Friday 4PM ET / start of the Singapore weekend. The moment that happens, last week's data is final and your sprint begins. You have the whole weekend to build a quality analysis. Do not arrive at Monday's class still building your prediction.
= weekend after US Friday close
US markets close → sprint begins
Data pull, agents, LLMs
Build your analysis
Prediction committed to GitHub
Timestamp = locked prediction
Market trades, watch it play out
Do not change your prediction
= weekend after US Friday close
Record actuals → sprint ends
Present, score, learn
Everything already prepared
Step 1 — Pull the data (R8)
- Open Finviz (finviz.com/futures_performance.ashx) — set to 1W. Screenshot immediately.
- Open Yahoo Finance Sectors (finance.yahoo.com/sectors) — set to 5D. Screenshot immediately.
- Record closing values and weekly % change for: SPX, NDX, IWM, Gold, Oil, 10-Year Yield, Bonds, VIX, Bitcoin.
- Record all 11 sector % changes. Note the top 3 leaders and bottom 3 laggards.
- Commit both screenshots to GitHub:
evidence/finviz_1W_YYYY-WXX_Sat.pngandevidence/yahoo_sectors_5D_YYYY-WXX_Sat.png
Step 2 — Build the three agents in parallel
- R3 Almanac Agent: Look up this week's month page, day notes, and sector seasonality table. Write the structured Almanac output using the template on this site.
- R4 Macro/News Agent: Check CME FedWatch, 10-year yield, oil, and TradingEconomics calendar for the coming week. Write the macro output.
- R5 Technical Agent: Open ProRealTime or TradingView. Pull up SPX, NDX, and IWM daily charts. Read the 8/21 EMA condition, trendline, and key levels. Annotate screenshots. Write technical output.
- All three outputs committed to GitHub before running the LLM synthesis.
Step 3 — Query all four AI models
- Paste the three agent outputs into the shared prompt template. Do not change a word between models.
- Query Claude, ChatGPT, Gemini, and DeepSeek in sequence. Save each raw response.
- Complete the Multi-LLM Comparison Table. Identify agreement and divergence.
- Commit all four raw responses:
synthesis_claude_YYYY-WXX.txtetc.
Step 4 — Apply human judgment and file the prediction
- R7 leads the team discussion on the five Human Score dimensions.
- Identify any override — what does your team see that the AIs did not weight correctly?
- Write the final prediction for SPX, NDX, and IWM — each with direction (Up/Down/Flat), a % range, and a confidence level.
- Commit
prediction_YYYY-WXX_teamname.mdto GitHub before midnight SGT Saturday. This timestamp is your locked prediction. It cannot be changed after this point.
Step 5 — Watch and log, do not change
- Do not modify the prediction file. The commit timestamp is evidence. Changing it after the fact invalidates your calibration score.
- Mid-week (Wednesday): R8 does an optional check. If a major surprise event moved markets more than 2% in a day, log it in the learning log as a mid-sprint note. This is not a prediction change — it is an observation.
- R5 may note mid-week if a key technical level was broken — again, logged as observation only.
Step 6 — Lock the actuals and score
- Open Finviz (1W) and Yahoo Sectors (5D) — these now show the completed week.
- Record actual % change for SPX, NDX, and IWM — these are the three scored assets.
- Also record Gold, Oil, Yield, VIX, Bitcoin, and sector leaders/laggards for context.
- R10 calculates the calibration score for each of the three primary predictions (see Scoring section).
- R10 updates the LLM horse race: which model predicted SPX direction most accurately this week?
- Commit actuals:
actuals_YYYY-WXX.md— alongside the original prediction file. Both files together = the evidence of prediction vs outcome.
Step 7 — Build the Monday presentation
- R10 writes the learning log entry: what the team predicted, what happened, what was surprising, what changes next sprint.
- R7 prepares the Human Score explanation — why the team agreed or disagreed with the AI consensus, and how that played out.
- R2 prepares the retrospective: what Scrum practice worked, what failed, what one thing changes next sprint.
- R9 creates the release tag
vWXXand confirms the repo is clean before Monday class. - Every role holder reviews their slot in the 20–25 minute presentation structure.
Step 8 — 20–25 minute sprint review presentation
- All preparation is complete before you walk in. There is no last-minute scramble.
- Every role holder presents their designated slot (see Presentation section for full timing).
- SPX, NDX, and IWM predictions vs actuals are stated clearly and scored publicly.
- The Human Score Analyst (R7) presents the team's reasoning and any override — this is the most important slot.
- After the presentation: next sprint's goal is agreed, backlog is updated, and the new prediction window begins the following Friday at 4PM ET (= Saturday over the weekend).
Multi-LLM Synthesis
Core ProcessOver the weekend (after US Friday market close), all four AI models receive the same structured prompt built from your three agent outputs. You compare their predictions side-by-side before writing your team's consensus.
ALMANAC EVIDENCE:
[Paste your Almanac Agent output here — seasonal bias, confidence, caveats]
MACRO / NEWS EVIDENCE:
[Paste your Macro/News Agent output here — FedWatch, rates, dollar, oil, calendar events]
TECHNICAL EVIDENCE:
[Paste your Technical Agent output here — EMA signals, trendlines, key levels]
REQUIRED OUTPUT — respond in exactly this structure:
1. Weekly Regime: [Bullish / Bearish / Neutral / Uncertain]
2. Confidence Score: [Low / Medium / High] + brief justification
3. Key Supporting Evidence: (3 points max)
4. Key Contradictions: (2 points max)
5. Invalidation Conditions: what would change this view
6. Predicted % move — SPX (S&P 500): [+X.X% to +X.X%] — direction + range
Predicted % move — NDX (Nasdaq 100): [+X.X% to +X.X%] — direction + range
Predicted % move — IWM (Russell 2000): [+X.X% to +X.X%] — direction + range
7. Plain-English brief: 2–3 sentences a non-expert can understand
8. Disclaimer: remind the reader this is not financial advice
Rule: Do not change the prompt between models. Do not add extra context for one model that others do not get. Fair comparison requires identical inputs. Store all four raw responses in GitHub named: synthesis_claude_YYYY-WXX.txt, synthesis_chatgpt_YYYY-WXX.txt, etc.
Multi-LLM Comparison Table — Fill This Every Sprint
After querying all four models, fill this table before writing your consensus:
| Dimension | Claude | ChatGPT | Gemini | DeepSeek |
|---|---|---|---|---|
| Weekly Regime | fill in | fill in | fill in | fill in |
| Confidence Score | Low/Med/High | fill in | fill in | fill in |
| SPX % estimate | e.g. +0.5% to +1.2% | fill in | fill in | fill in |
| NDX % estimate | e.g. +0.8% to +1.5% | fill in | fill in | fill in |
| IWM % estimate | e.g. −0.5% to +0.5% | fill in | fill in | fill in |
| Top supporting reason | key phrase | fill in | fill in | fill in |
| Top contradiction cited | key phrase | fill in | fill in | fill in |
| Invalidation condition | what would change it | fill in | fill in | fill in |
| Tone / caveat language | cautious/assertive | fill in | fill in | fill in |
Consensus protocol: Where ≥3 models agree → high-confidence core of your brief. Where models diverge → document as a contradiction and include in your watchlist. Your final prediction must state which view was chosen and why you weighted certain models more or less for that specific week.
Calibration Scoring — SPX, NDX & IWM
How You're GradedThe three scored instruments are SPX (S&P 500), NDX (Nasdaq 100), and IWM (Russell 2000). You make a prediction for each one every week: direction (Up/Down/Flat) and a % range. You are not scored on whether you predicted correctly — you are scored on how well your stated confidence matched your outcome. This is called calibration. A cautious, evidenced "uncertain" is worth more than an overconfident wrong call.
Why SPX, NDX, and IWM — and What Comparing Them Tells You
The benchmark. 500 large US companies. The most widely followed index in the world. Your SPX prediction is the headline call.
ETF: SPY
Tech-heavy. If NDX outperforms SPX, the market is in a growth/tech-led regime. If NDX lags, the rally is rotating away from tech.
ETF: QQQ
Small caps. Most sensitive to US interest rates and domestic economic conditions. IWM lagging SPX is an early warning sign.
ETF: IWM
The relationship between the three is the insight. If SPX is up but IWM is down, the rally is narrow and rate-sensitive — a warning. If NDX leads SPX by 2%+ in a week, tech is driving everything — check yields. If IWM leads all three, risk appetite is broad and healthy. Predicting all three forces you to think about which kind of week it will be, not just which direction.
| Stated Confidence | Direction | Outcome | Score | Reason |
|---|---|---|---|---|
| High | Up or Down | ✅ Correct | +3 pts | Well-evidenced, committed, and right |
| Medium | Up or Down | ✅ Correct | +2 pts | Good reasoning, measured confidence |
| Low / Uncertain | Up or Down | ✅ Correct | +1 pt | Honest about limits, got lucky — acceptable |
| High | Up or Down | ❌ Wrong | −2 pts | Overconfidence penalty — worst outcome |
| Medium | Up or Down | ❌ Wrong | 0 pts | Tried, wrong, not overconfident — neutral |
| Low / Uncertain | Any | ❌ Wrong | +1 pt | Honest uncertainty — always rewarded |
LLM Horse Race — Tracked Across the Trimester
Each week you record which AI model was closest to the actual S&P 500 % move. Over 10 weeks, a leaderboard emerges. By Week 10, you will have real data to answer: which AI model is most calibrated for weekly market regime prediction? This is valuable AI literacy — based on evidence, not opinion.
Upset of the week recognition: When the actual move defied all four AI models, that week's richest learning is: why were the models all wrong? The team that best explains the miss scores highest for that week, regardless of their prediction accuracy.
This Week's Setup
Week 2 · 26 May 2026This section is your team's working sprint board. Use the worked example below as your reference standard for what a complete submission looks like. Your team owns this workflow — read it, follow it, and improve on it every sprint.
What Actually Happened: 19–23 May 2026
This is your evidence base for the Week 2 prediction. These are the Friday 23 May closing figures. Source: Finviz 1W view + Yahoo Finance.
| Asset | Fri 23 May Close | Weekly % Change | Signal Reading |
|---|---|---|---|
| S&P 500 | 7,473 | +1.0% | Recovered after Mon–Tue selloff. Closed near highs. Breadth narrow — only 57% of stocks above 200-day MA. |
| Nasdaq 100 | ~21,100 | +0.7% | Lagged S&P. Nvidia earnings beat helped Wed recovery but tech breadth flat all week. |
| Russell 2000 | 2,869 | +2.7% | Best performer this week. Small caps led — unusual when yields are elevated. Driven by Iran deal optimism and oil drop. |
| Gold | 4,523 | −0.4% | Slight pullback. Still elevated near all-time highs. Safe haven demand eased slightly as stocks recovered. |
| Crude Oil (WTI) | ~$61–62 | −5.7% | Big drop. Iran–US nuclear deal draft talk + Trump comments drove oil sharply lower. Key macro event of the week. |
| 10-Year Yield | 4.60% | +4.5% | Hit a ONE-YEAR HIGH of 4.60% on Monday. Eased slightly by Friday as oil dropped. Elevated yields = ongoing pressure on growth stocks. |
| US Bonds (TLT) | ~$83–84 | −2.6% | Fell as yields rose. Bonds and stocks sold off together Mon–Tue — unusual, signals fiscal/inflation concern. |
| VIX | 16.70 | +1.2% | Elevated but not panicked. Stayed below 20. Market cautious but not fearful. Dropped from intraweek highs as stocks recovered. |
| Bitcoin | 75,400 | −2.5% | Pulled back with risk-off sentiment early week. Did not recover fully even as stocks bounced. Risk appetite for crypto muted. |
Week 1 Sector Scorecard (Yahoo Finance 5D view, Fri 23 May)
Week 1 Story in one sentence: The 10-year yield hitting a 1-year high of 4.60% was the defining event — it crushed Real Estate, Utilities, and Discretionary; oil's sharp drop on Iran deal talk saved Russell 2000 and gave Energy an odd boost; stocks recovered Thursday–Friday but breadth remained narrow.
Fed, Rates & Key Events Heading into Week 2
Week 2 Calendar Watch (27–30 May): US markets CLOSED Monday 26 May (Memorial Day). Key events: Consumer Confidence (Tue), GDP (Gross Domestic Product — the total value of goods and services produced) revision (Thu), PCE inflation — the Fed's preferred measure — (Fri). PCE on Friday is the single most important data point of the week. A hot PCE = yields rise, stocks struggle. A cool PCE = yields fall, stocks rally.
Complete Model Submission — What a Strong Week 2 Prediction Looks Like
This is a full example of what Prof. Dr. Tan expects to see from every team on Monday 26 May. Read it carefully. Your submission does not need to reach the same conclusion — but it must show the same structure, evidence discipline, and honest confidence calibration.
Step 1 — Three Agent Outputs (built before querying any AI)
Late May (Week 21–22) historically bullish. S&P up 65–80% of the time. Pre-Memorial Day weekend typically sees light-volume drift upward as sellers step aside. Confidence in pattern: Medium — macro environment (elevated yields) is unusual vs. historical baseline. Caveat: seasonal tailwind exists but is not strong enough to override a macro shock like a PCE beat.
Fed: 96% hold probability at June 17 FOMC (Federal Open Market Committee — the Fed body that sets interest rates). Rate: 3.50–3.75% unchanged. 10-year yield at 4.60% — 1-year high, eased Friday as oil fell. Oil: WTI ~$61, down ~6% on Iran deal hopes — no confirmation yet. Key event this week: PCE inflation Friday. If PCE ≥ 2.8% YoY: yields likely spike, stocks sell off. If PCE ≤ 2.5%: yields ease, stocks rally. Also: Consumer Confidence Tue, GDP revision Thu. Macro driver: Oil/Iran geopolitics. Bias: Cautiously neutral — PCE is a binary risk event.
S&P 500: Price above 8 EMA ✓. 8 EMA above 21 EMA ✓ — momentum intact. However breadth weak: only 57% of S&P stocks above 200-day MA — rally is narrow. Russell 2000 outperformed but needs follow-through above 2,900 to confirm. Key level to watch: S&P 7,500 resistance. Invalidation: Close below 7,350 = trend reversal warning. VIX at 16.7 — calm but not complacent. Technical bias: Neutral-bullish while above 7,350.
Step 2 — Multi-LLM Comparison Table
All four models received the identical prompt with the three agent outputs above pasted in. Responses recorded Saturday 24 May 2026.
| Dimension | Claude | ChatGPT | Gemini | DeepSeek |
|---|---|---|---|---|
| Weekly Regime | Neutral-Bullish | Neutral | Neutral-Bullish | Uncertain |
| S&P % estimate | +0.3% to +0.9% | −0.5% to +0.8% | +0.2% to +1.0% | −1.0% to +1.0% |
| Confidence | Medium | Low–Medium | Medium | Low |
| Top supporting reason | Seasonal tailwind + oil drop reduces inflation pressure + technical momentum intact | Technical trend above EMAs, pre-holiday volume typically thin and upward-drifting | Oil decline = easing inflation expectations = yields may ease = supports equities | Breadth weakness and elevated yields make directional call unreliable this week |
| Top contradiction | PCE on Friday is binary — a hot print reverses everything | 10-year at 4.60% is gravity. Stocks climbing against this is unsustainable if yields rise further | Iran deal not confirmed. If talks collapse, oil spikes, yields follow, stocks sell off | Stocks and bonds both fell Mon–Tue — simultaneous selloff signals deeper concern than a normal correction |
| Invalidation condition | PCE ≥ 2.8% or Iran talks collapse causing oil spike above $68 | 10-year yield breaks above 4.70%; VIX closes above 22 | Any confirmed geopolitical escalation in Middle East beyond current Iran talks | Any single macro print that causes 10-year to spike above 4.75% |
| Tone | Measured, evidence-based, acknowledged PCE risk clearly | Cautious, emphasised wide uncertainty band | Slightly more optimistic, leaned on oil narrative | Most conservative — flagged structural concerns about narrow breadth |
Step 3 — Team Consensus Brief & Final Prediction
REGIME: Neutral-Bullish with elevated uncertainty. Three of four models lean cautiously positive; DeepSeek is the outlier flagging structural breadth concern.
PREDICTIONS (26–30 May 2026):
Nasdaq 100 ...... UP +0.2% to +0.7% Confidence: LOW–MEDIUM
Russell 2000 .... FLAT −0.5% to +0.8% Confidence: LOW
Gold ............ DOWN −0.5% to −1.5% Confidence: MEDIUM
Crude Oil (WTI) . FLAT −1% to +2% Confidence: LOW (Iran binary)
10-Year Yield ... FLAT 4.50% to 4.65% Confidence: MEDIUM
US Bonds (TLT) .. FLAT slight recovery Confidence: LOW
VIX ............. DOWN toward 15–16 Confidence: MEDIUM
Bitcoin ......... FLAT −2% to +2% Confidence: LOW
LEADING SECTOR: Energy — Iran deal uncertainty keeps energy stocks in focus; oil volatility = sector volatility but energy companies benefit from geopolitical premium even if oil price is falling.
LAGGING SECTOR: Real Estate — 10-year yield at 4.60% is direct headwind for REITs. No near-term catalyst for relief unless PCE comes in cold.
KEY EVIDENCE (3 points):
1. Oil dropped 6% on Iran deal hopes — reduces inflation pressure, gives Fed room to hold without hiking, mild tailwind for stocks
2. Seasonal bias: late May pre-holiday is historically positive (65–80% of years) with light selling pressure
3. Technical: S&P above 8 EMA and 21 EMA — momentum structure still intact despite narrow breadth
KEY CONTRADICTION (why we kept confidence MEDIUM not HIGH):
PCE inflation data releases Friday. This is the Fed's preferred inflation measure. Claude and ChatGPT both flagged it — a hot print above 2.8% YoY would likely push yields above 4.65%, invalidate our bullish lean, and cause a Friday selloff. DeepSeek's concern about narrow breadth is valid: when only 57% of S&P stocks are above their 200-day MA, the index can look healthy while the majority of stocks are struggling. This is a structural warning we are monitoring but not acting on this week.
INVALIDATION CONDITIONS:
Our bullish lean is WRONG if: (a) PCE Friday ≥ 2.8% YoY, (b) 10-year yield breaks and holds above 4.70%, (c) Iran talks collapse causing oil spike above $68, or (d) VIX closes above 22 on any day this week.
PLAIN ENGLISH: We think the market will drift slightly higher this week, mainly because oil falling is good news for inflation and the pre-holiday week tends to be quiet and positive. However, Friday's inflation data is a wildcard that could reverse everything, so we are not very confident. If you hear bad inflation news on Friday, our prediction is probably wrong.
⚠️ This is an educational exercise and decision-support prototype. Nothing in this brief constitutes financial advice. All AI model outputs were reviewed and verified by the team before publication.
Why this submission scores well: It shows all three agent outputs before touching any AI. It queried all four models with identical prompts and filled the comparison table. It did not just copy one model's answer — it identified where models agreed (3 of 4 lean bullish) and where one diverged (DeepSeek flagged breadth concern) and explained why. Confidence was kept at MEDIUM because of the known PCE risk on Friday — not HIGH, which would have been overconfident. Invalidation conditions are specific and testable. The plain-English summary is readable by a non-expert. The disclaimer is present.
📋 Prof. Dr. Tan's Instructions for Week 2
- US markets are closed Monday 26 May (Memorial Day). Your prediction covers the 4-day week: Tue 27 – Fri 30 May.
- Build your three agent outputs before opening any AI tool. Write them in your own words first.
- Query all four LLMs. Save the raw responses in GitHub as:
evidence/synthesis_claude_2026-W22.txtetc. - Complete the comparison table. Identify where models agree and where they diverge.
- Special this week: Predict both a leading sector AND a lagging sector with reasons.
- Commit your prediction file to GitHub by the weekend before class. File name:
prediction_2026-W22_[teamname].md - On Friday 30 May after market close, record actuals and commit. Do not change your Monday prediction.
The Product — IRIS 2.0 Build Target
Sprint 5–10 GoalFrom Week 5 onward, your team builds toward a live, deployable product. The weekly prediction workflow you practise in Weeks 2–4 becomes the engine inside this product. By Week 10, you demo a working application on Hugging Face that any user can open in a browser and use.
SPX, NDX, and IWM weekly predictions. 9 macro assets tracked. Multi-LLM synthesis panel. Human Score layer. Plain-English weekly brief with disclaimer.
All 11 XL ETF (Exchange-Traded Fund) direction calls displayed as a colour-coded heatmap. Rotation signal chart. Almanac seasonal badges. Top and bottom sector picks with reasoning.
User enters any ticker (NVDA, GOOGL, AAPL, etc.). Live 6-month chart via yfinance. 8 EMA (Exponential Moving Average) + 21 EMA auto-calculated. EMA zone detected. 4-LLM synthesis for that stock. Sector linkage shown.
| Tool | Purpose | Access |
|---|---|---|
| Python + Flask | Backend web framework — handles routes, data, LLM (Large Language Model) calls | pip install flask |
| yfinance | Free market data — any ticker, live OHLCV (Open, High, Low, Close, Volume) history, no API (Application Programming Interface) key needed | pip install yfinance |
| HTML/CSS/JS | Frontend dashboard — Flask serves Jinja templates | No install needed |
| Anthropic API | Live Claude calls for LLM synthesis | console.anthropic.com |
| OpenAI API | Live ChatGPT calls for LLM synthesis | platform.openai.com |
| Gemini API | Live Gemini calls for LLM synthesis | ai.google.dev |
| DeepSeek API | Live DeepSeek calls for LLM synthesis | platform.deepseek.com |
| Hugging Face Spaces | Free hosting — deploy as Docker Space, get a public URL | huggingface.co/spaces |
| GitHub | Source control — HF Space syncs automatically from repo | github.com |
The five things that separate a great product from a good one:
1. The 10-sprint track record — a running chart of prediction vs actual with calibration scores. Nobody has built this before.
2. The LLM horse race — which AI model was most accurate over 10 weeks? Real data, real answer.
3. Human Score as a visible layer — not hidden inside the brief. Scored, displayed, comparable to the AI consensus.
4. Plain-English brief quality — test it on someone who has never read a financial report. If they understand it, you succeeded.
5. Stock Spotlight sector linkage — when a user enters NVDA, show them it is in XLK and XLK's seasonal window is bullish. That connection is the value-add over a plain stock chart.
Live Prediction Board — All Groups
CompetitiveThis is where the competition happens. Every group files their prediction over the weekend — SPX, NDX, IWM directions and % ranges, plus all 11 sector ETF calls. All predictions are visible to all groups the moment they are filed. By Monday morning, everyone knows what everyone else predicted. That is the pressure that makes you think harder.
📋 How to use the Prediction Board
Select your group from the dropdown in the board header
Click "File Prediction" — enter all SPX/NDX/IWM predictions and all 11 sector ETF directions
Add your Human Score total and the one insight your team saw that no AI raised
Hit "Live Board" to see what all other groups predicted — before Monday class
After Friday close: enter actuals in "Record Actuals" to unlock the leaderboard scores
Check the Leaderboard for calibration scores — scored on confidence accuracy, not just direction
Predictions are shared — that is the point. Once you file a prediction, every other group can see it. This creates genuine competitive pressure and makes Monday's class discussion much richer. You will want to come in having thought carefully about where you agree and disagree with other groups — and why.