C++ で今日は何の日か調べる

Wikipedia および一般社団法人日本記念日協会にアクセスし、カレンダーで指摘した日付で起きた出来事や記念日を表示したり、ファイルに保存するプログラムを作る。「PHPで今日は何の日か調べる」「PHPで記念日を表示する」で作ったPHPプログラムを合体し、C++に移植したものである。画面上で並べ替えることもできる。

（2025年3月8日）使用ライブラリ更新
（2024年11月23日）一覧表示時の不具合修正，使用ライブラリ更新
（2024年7月27日）Wikipediaのフォーマットが変わったことを受け、getDayToday()関数の取得パターンを変更．使用ライブラリ更新．
（2024年3月16日）使用ライブラリ更新

サンプル・プログラム

ダウンロード

圧縮ファイルの内容
daytodaywin.msi	インストーラ
bin/daytodaywin.exe	実行プログラム本体
bin/libcurl-x64.dll	実行時に必要になるDLL
bin/etc/help.chm	ヘルプ・ファイル
sour/daytodaywin.cpp	ソース・プログラム
sour/resource.h	リソース・ヘッダ
sour/resource.rc	リソース・ファイル
sour/application.ico	アプリケーション・アイコン
sour/mystrings.cpp	汎用文字列処理関数など（ソース）
sour/mystrings.h	汎用文字列処理関数など（ヘッダ）
sour/makefile	ビルド

daytodaywin.cpp 更新履歴
バージョン	更新日	内容
1.4.4	2025/03/08	使用ライブラリ更新
1.4.3	2024/11/23	makeListView()--debug, 使用ライブラリ更新
1.4.2	2024/07/27	getDayToday()取得パターンを変更, getDay使用ライブラリ更新
1.4.1	2024/03/16	使用ライブラリ更新
1.4.0	2023/11/04	注釈番号を除去，使用ライブラリ更新

使用ライブラリ

Wikipediaや日本記念日協会にアクセスするために、オープンソースのライブラリ Boost C++ライブラリ、cURL （カール）および OpenSSL が必要になる。導入方法等については、「C++ 開発環境の準備」をご覧いただきたい。

リソースの準備

Eclipse を起動し、新規プロジェクト daytodaywin を用意する。
ResEdit を起動し、resource.rc を用意する。
Eclipse に戻り、ソース・プログラム "daytodaywin.cpp" を追加する。
リンカー・フラグを -mwindows -static -lstdc++ -lgcc -lwinpthread -lcurl -lssl -llzma -lz -lws2_32 "C:\pleiades\eclipse\mingw\x86_64-w64-mingw32\bin\libcurl-x64.dll" に設定する。

MSYS2 コマンドラインからビルドするのであれば、"makefile" を利用してほしい。

解説：ヘッダファイル等

daytodaywin.cpp

  34: using namespace std;
  35: using namespace boost;
  36: using namespace boost::property_tree;
  37:
  38: #define MAKER       "pahoo.org"             // 作成者
  39: #define APPNAME     "daytodaywin"           // アプリケーション名
  40: #define APPNAMEJP   "今日は何の日"          // アプリケーション名（日本語）
  41: #define APPVERSION  "1.4.4"                 // バージョン
  42: #define APPYEAR     "2020-25"               // 作成年・更新日
  43: #define REFERENCE   "https://www.pahoo.org/e-soul/webtech/cpp01/cpp01-11-01.shtm"   // 参考サイト
  44:
  45: // char*バッファサイズ
  46: #define SIZE_BUFF2      5120
  47:
  48: // ListViewItemの最大文字長：変更不可
  49: #define MAX_LISTVIEWITEM    255
  50:
  51: // リクエストURL（日本記念日協会）（※変更不可）
  52: #define KINENBI_URL     "https://www.kinenbi.gr.jp/"
  53:
  54: // 記念日識別値（※変更不可）
  55: #define KINENBI_YEAR    (-99999)
  56:
  57: // 現在のインターフェイス
  58: static HINSTANCE hInst;
  59:
  60: // アプリケーション・ウィンドウ
  61: HWND hParent;
  62:
  63: // アプリケーション・ウィンドウ位置
  64: unsigned hParent_X, hParent_Y;
  65:
  66: // エラー・メッセージ格納用
  67: string ErrorMessage;
  68:
  69: // ヘルプ・ファイル
  70: #define HELPFILE    ".\\etc\\help.chm"
  71:
  72: // デフォルト保存ファイル名
  73: #define SAVEFILE    "daytodaywin.csv"
  74:
  75: // UserAgent
  76: string UserAgent;

Windowsの ListViewクラスは、1つのカラムに格納できるデータ長制限がある。これを定数 MAX_LISTVIEWITEM で定義しておき、ListViewに代入するときにsubstrメソッドを使ってこの範囲に収まるようにする。

その他、とくに注意記載が無い限り、定数は自由に変更できる。

解説：データ構造

daytodaywin.cpp

  78: // 出来事格納用クラス
  79: #define SIZE_EVENTS     500     // 格納上限
  80: class _Events {
  81: public:
  82:     wstring category = L"";     // 区分
  83:     int year = 0;               // 西暦年
  84:     wstring era = L"";          // 元号
  85:     wstring event = L"";        // 出来事
  86:     wstring plus = L"";         // 追加情報
  87: };
  88: unique_ptr<_Events> Events[SIZE_EVENTS] = {};
  89:
  90: // ソート用構造体
  91: struct _stEvents {
  92:     wchar_t* category = NULL;   // 区分
  93:     int *year = NULL;           // 西暦年
  94:     wchar_t* event = NULL;      // 出来事
  95:     size_t id = 0;              // Eventsの添字
  96: };
  97: vector<_stEvents> Vevents;

取得したHTMLコンテンツをPHPの連想配列のようにして使うために、C++スマートポインタ unique_ptr を導入する。
コンテンツの必要な部分をクラス _Events に収め、このクラスの実体であるオブジェクトへのポインタに unique_p 配列tr を利用する。出来事の数が未知なので、オブジェクトは必要に応じて動的に確保する（つまり、配列の要素数は不定）。

データ要素としては、日本語テキストを格納する要素にはワイド文字列型（wstring型）を割り当てた。西暦年は、intg型で格納しておく。

クラス _Events のメンバのうち、ソート可能な項目（文字列）へのポインタを vectorクラスへコピーする背景については、「C++ で Googleニュース検索」で述べたとおりである。

解説：Wikipedia URL取得

daytodaywin.cpp

320: /**
321:  * Wikipedia URL取得
322:  * @param   int month, day  月，日
323:  * @param   char* url       URLを格納
324:  * @param   size_t sz       urlの最大長
325:  * @return  なし
326: */
327: void getURL_Wikipedia(int month, int day, char *url, size_t sz) {
328:     const string wiki = "https://ja.wikipedia.org/wiki/";
329:     static char buff1[SIZE_BUFF2 + 1], buff2[SIZE_BUFF2 + 1];
330:
331:     snprintf(buff1, SIZE_BUFF2, "%d月%d日", month, day);
332:     string s1 = sjis_utf8(buff1);
333:
334:     CURL *curl = curl_easy_init();
335:     strncpy(buff2, curl_easy_escape(curl, (const char *)s1.c_str(), strlen(s1.c_str())), SIZE_BUFF2);
336:
337:
338:     string ss = wiki + (string)buff2;
339:     strncpy(url, ss.c_str(), SIZE_BUFF2);
340:
341:     curl_easy_cleanup(curl);
342: }

Wikipedia には「×月×日」という見出しがあり、ここから「今日は何の日」を取得する。
見出しURLを作成するユーザー関数が getURL_Wikipedia である。

解説：今日は何の日取得

daytodaywin.cpp

354: /**
355:  * 今日は何の日取得
356:  * @param   int  month, day 月，日
357:  * @param   bool lfn        脚注を残す（省略時＝FALSE）
358:  * @return  int 情報件数
359: */
360: int getDayToday(int month, int day, bool lfn=FALSE) {
361:     // 初期化
362:     for (int i = 0; i < SIZE_EVENTS; i++) {
363:         Events[i].reset();
364:         Events[i] = NULL;
365:     }
366:
367:     // Wikipediaコンテンツ取得
368:     char url[SIZE_BUFF2];
369:     string contents = "";
370:     getURL_Wikipedia(month, day, (char *)url, SIZE_BUFF2);
371:     bool res = readWebContents(url, UserAgent, &contents);
372:     if (res == FALSE) {
373:         ErrorMessage = "Wikipediaにアクセスできません．";
374:         return (-1);
375:     }
376:
377:     // コンテンツの解釈
378:     setlocale(LC_ALL, "Japanese");
379:     int cnt = -1;
380:     stringstream ss;
381:     string ss0;
382:     wstring mode = L"";
383:     wstring ws, ws2;
384:     wstringstream wss;
385:     static wsmatch mt1, mt2, mt3;
386:     static smatch mt4;
387: //  wregex re1(_SW("<span[\\s\\S]+id=\"できごと\""));
388: //  wregex re2(_SW("<span[\\s\\S]+id=\"誕生日\""));
389: //  wregex re3(_SW("<span[\\s\\S]+id=\"忌日\""));
390:     wregex re1(_SW("<h2[\\s\\S]+id=\"できごと\""));
391:     wregex re2(_SW("<h2[\\s\\S]+id=\"誕生日\""));
392:     wregex re3(_SW("<h2[\\s\\S]+id=\"忌日\""));
393:     wregex re5(_SW("<li>[^\\>]+([0-9]+(年|世紀).+)<\\/li>"));
394:     wregex re6(_SW("([^0-9]*)([0-9]+)年(\\s*[\(（].[^\\)）]+[）\\)].)?[ \\-]+(.+)"));   // 年 - 出来事
395:     wregex re7(_SW("([^0-9]*)([0-9]+)年(\\s*[\\(（].[^\\)）]+[）\\)].)?[ \\-]+([^、]+)、?([^（\\(]+)*([^0-9]*)([0-9]*)"));      // 年 - 名前、職業（生没年）
396:     wregex re8(_SW("<span[\\s\\S]+id=\"フィクションのできごと\">"));
397:     wregex re9(_SW("^\\（[\\*\\+]."));          // 年 - 名前、職業（生没年）
398:     wregex re11(_SW("紀元前"));
399:     wregex re99(_SW("<span[\\s\\S]+class=\"vector-menu-heading-label\">他言語版"));     // 1行読み込み打ち切り
400:
401:     ss << contents;
402:     while(ss && getline(ss, ss0)) {
403:         // 1行をwstring変換
404:         ws = _UW(ss0);
405:         // 4000文字以上なら処理しない
406:         if (ws.length() > 4000) {
407:             continue;
408:         }
409:         // マッチング処理
410:         if (regex_search(ws, mt1, re99)) {
411:             break;
412:         } else if (regex_search(ws, mt1, re1)) {
413:             mode = _SW("出来事");
414:         } else if (regex_search(ws, mt1, re8)) {
415:             mode = _SW("架空");
416:         } else if (regex_search(ws, mt1, re2)) {
417:             mode = _SW("誕生");
418:         } else if (regex_search(ws, mt1, re3)) {
419:             mode = _SW("死亡");
420:         } else if ((mode.length() != 0) && (regex_search(ws, mt1, re5) == TRUE)) {
421:             // HTMLタグ除去
422:             ws = wstrip_tags(ws);
423:             // 注釈番号除去
424:             ws = wstripAnnotations(ws);
425:             // 出来事
426:             if ((mode == _SW("出来事")) && (regex_search(ws, mt2, re6) == TRUE)) {
427:                 cnt++;
428:                 Events[cnt] = make_unique<_Events>();
429:                 Events[cnt]->category = mode;
430:                 if (mt2[1].str().length() == 0) {
431:                     Events[cnt]->year = atoi(_WS(mt2[2].str()).c_str());
432:                 } else {
433:                     Events[cnt]->year = 0 - atoi(_WS(mt2[2].str()).c_str());
434:                 }
435:                 Events[cnt]->era   = mt2[3].str();
436:                 Events[cnt]->event = mt2[4].str();
437:
438:             // 誕生・死亡
439:             } else if (regex_search(ws, mt2, re7)) {
440:                 cnt++;
441:                 Events[cnt] = make_unique<_Events>();
442:                 Events[cnt]->category = mode;
443:                 if (mt2[1].str().length() == 0) {
444:                     Events[cnt]->year = atoi(_WS(mt2[2].str()).c_str());
445:                 } else {
446:                     Events[cnt]->year = 0 - atoi(_WS(mt2[2].str()).c_str());
447:                 }
448:                 Events[cnt]->era = mt2[3].str();
449:                 ws2 = mt2[6].str();
450:                 if (regex_search(ws2, mt3, re11) == TRUE) {
451:                     Events[cnt]->plus = L"-" + mt2[7].str();
452:                 } else if (regex_search(ws2, mt3, re9) == TRUE) {
453:                     Events[cnt]->plus = mt2[7].str();
454:                 } else {
455:                     Events[cnt]->plus = L"";
456:                 }
457:                 wss << mt2[4].str() << _SW("（") << mt2[5].str();
458:                 if (mode == _SW("誕生")) {
459:                     wss << _SW("）誕生");
460:                     if (Events[cnt]->plus.length() != 0) {
461:                         wss << _SW("（〜") << Events[cnt]->plus << _SW("年）");
462:                     }
463:                 } else {
464:                     wss << _SW("）死去");
465:                     if (Events[cnt]->plus.length() != 0) {
466:                         wss << _SW("（") << Events[cnt]->plus << _SW("年〜")<< Events[cnt]->year << _SW("年）");
467:                     }
468:                 }
469:                 Events[cnt]->event = wss.str();
470:                 wss.str(L"");
471:             }
472:         }
473:     }
474:     return cnt;
475: }

取得するURLが決まったら、cURL 関数群を使って全コンテンツを変数 chunk に代入する。
次に、この内容をスクレイピングしていくのだが、今回は、ワイド文字列に対する正規表現を使うことにした。ソースはSJISで書いているので、ユーザーマクロ関数 _SW を使ってワイド文字列に変換し、これを使って正規表現によるパターンマッチングを行う。ちなみに、C++に正規表現が正式導入されたのはC++11からで、Boost C++ライブラリをベースにしている。

スクレイピング用パターン
パターン名	内容
re1	できごと
re2	誕生日
re3	忌日
re5	西暦年
re6	西暦年（出来事）
re7	年 - 名前、職業（生没年）
re8	フィクションのできごと
re9	年 - 名前、職業（生没年）
re11	紀元前の識別
re99	1行読み込み打ち切りパターン

解説：記念日を取得

daytodaywin.cpp

477: /**
478:  * 指定月日の記念日を取得する
479:  * @param   int month, day 月日
480:  * @param   int cnt 情報を追加する配列の最初の番号
481:  * @return  int 情報の総数
482: */
483: int getAnniversary(int month, int day, int cnt=0) {
484:     // コンテンツ読み込み
485:     static char post[SIZE_BUFF2 + 1];
486:     snprintf(post, SIZE_BUFF2, "MD=%d&M=%d&D=%d", month, month, day);
487:     string contents = "";
488:     bool res = readWebContents(KINENBI_URL, UserAgent, &contents, post);
489:
490:     if (res == FALSE) {
491:         ErrorMessage = "日本記念日協会のサイトにアクセスできません．";
492:         return (-1);
493:     }
494:
495:     // スクレイピング
496:     wregex re1(_SW("<a\\s+class=\"winDetail\"\\s+href=\"([^\"]+)\"><font[^>]+>([^<]+)</font>"), wregex::icase);
497:     static wsmatch mt1;
498:
499:     stringstream ss;
500:     wstring mode = _SW("記念日");
501:     string ss0;
502:     wstring ws;
503:     ss << contents;
504:     while(ss && getline(ss, ss0)) {
505:         // 1行をwstring変換
506:         ws = _UW(ss0);
507:         if (regex_search(ws, mt1, re1)) {
508:             Events[cnt] = make_unique<_Events>();
509:             Events[cnt]->category = mode;
510:             Events[cnt]->event = mt1[2].str();
511:             Events[cnt]->year  = KINENBI_YEAR;
512:             cnt++;
513:         }
514:     }
515:
516:     return cnt;
517: }