¤Ï¨îÀݥΠTANet News ¨t²Î¤§¬ã¨s




³¯©÷²±¡B¾G¤¤¼Ù

°ê¥ß¥æ³q¤j¾Ç¹q¤l­pºâ¾÷¤¤¤ß¡B°ê¥ß¥æ³q¤j¾Ç®Õ¶éºô¸ôµ¦¶i·|
·s¦Ë¥«¤j¾Ç¸ô1001¸¹
TEL:(03) 5712121 EXT.31721 52833
EMAIL:cschen@cc.NCTU.edu.tw clcheng@CCCA.NCTU.edu.tw




ºK­n

¡@¡@ºô»Úºô¸ôªº¦¨ªø¡A±a°Ê¸ê°Tªº§Ö³t¬y³q¡C¤@¯ëªº¨Ï¥ÎªÌ ¡A¸g¥Ñ¯S©wºô¸ôªA°È·~ªÌ³s¤Wºô¸ô¡AµM«á³z¹L Internet News/BBS¡A ´N¥i¥H«Ü®e©ö¦a°µ¨ì¡A»Pºô¸ô¤W¨ä¥L¨Ï¥ÎªÌ¡AÂù¦V¥æ¬yªº¥Ø¦a¡C ±q¦n³B¨Ó¬Ý¡A¥Ø«eªº Internet News/BBS¡A¹ï¤j¦h¼Æªº¤H¦Ó¨¥¡A ¬O¤@­Ó­Ý¨ã¤è«K»P§Ö³tªºÂù¦V¥æ¬yºÞ¹D¡A¦Ó¥B¤]¬Û·í¸gÀÙ¡A¨Ï¥Î°_¨Ó¤]«Ü²³æ¡C ¤£¹L¡A±q¥t¤@¤è­±¨Ó¬Ý¡A¤]¥¿¦]¬°¨ä¤è«K¡B§Ö³t¡A¥B¬Ý¨Ó¤S¸gÀÙªº¯S©Ê¡A ¦ü¥G¤]¨S¦³¯S§O³]¨¾¡A©ó¬O³\¦h¦³¤ß¤H¤h¡A«K§ì¦í³o¨Ç¯S©Ê¡A¥ô·N¥|³B±i¶K »P¦U°Q½×°Ï¡A¥DÃD¤£ºÉ¬Û²Åªº¤å³¹¡A©ÎªÌ¸g±`¦P¤@½g¤å³¹¨ì³B­«Âбi¶K¡Kµ¥¡A ³vº¥§Î¦¨¤@¨ÇÀݥΠNetNews/BBS ¨t²Îªº²{¶H¡A³y¦¨³\¦h¨Ï¥ÎªÌ»PºÞ²zªÌªº§xÂZ¡C ­±¹ï³o¼Ëªº°ÝÃD¡A¹ê¦b¦³¥²­n±Ä¨ú¬ÛÃöªº¦]À³±¹¬I¡C

¡@¡@ ¥»¤å§Y¦b±´°Q³\¦h±`¨£¤Ï¨î¨t²Îªº¹B§@­ì²z, ¹Á¸Õ¤ñ¸û¦UºØ¤è¦¡ªºÀu¯ÊÂI¡A ¨Ã¥B¦b»OÆW¾Ç³Nºô¸ô¤W¡A«Ø¸m¼Æ­Ó¹êÅ篸¡A¶i¦æ¹ê¦aªº¾Þ§@´ú¸Õ¡C¦b¾ú¸g¤F¶W¹L 8­Ó¤ëªº³sÄò¹êÅç¡A¾ãÅé°Ñ»P¹êÅ窺ºô¯¸¤W¡A²Ö¿n¤F³\¦h¥»¦aºô¯¸¬ÛÃöªº¸ê®Æ¡C ³o¤@³¡¥÷¡A°£¤F¤ä´©¥»¤åªº¬ã¨s¤ÀªR¥~¡A§Ú­Ì¤]¤w¸g±N³o¤@¨Ç¸ê®Æ¡A©ñ¨ì¬ÛÃöªººô¯¸¤W ¡£http://news-peer.nctu.edu.tw¡¤¡A ´£¨Ñ¨ä¥L³æ¦ì¦³¿³½ìªººô¤Í§@¬°°Ñ¦Ò¡C





1. USENET ¨t²Î²¤¶

¡@¡@USENET[1] ¬O¤@­ÓÅÞ¿è©Êªº¸ê°T¶Ç»¼ºô¸ô¡A¨Ï¥ÎªÌ¹M¤Îºô»Ú ºô¸ô (Internet) ¦U­Ó¨¤¸¨¡C¥Ø«e¦­¤w¦¨¬°¦p E-mail¡BWWW¡BFTP µ¥¤@¼Ë¡A¬Û·í´¶¹M ªººô¸ôÀ³¥ÎªA°È¡C¦b»OÆW USENET News §ó»P Internet BBS §G§iÄæ°Q½×¥\¯à¬Û¤¬µ² ¦X¡A¦¨¬°³Ì¨üÅwªïªººô¸ôÀ³¥Î¤§¤@¡C¤]¦]¦¹¡A¦³³\¦h¤H«K¥Hºô¸ô½×¾Âªº¤¤¤å¦WºÙ ¨ÓºÙ©I¡C¤£¹L¹ï©ó²ßºD©ó Internet ¥Î»yªº¤H¡AÁÙ¬O¸g±`ª½±µ¨Ï¥Î USENET News¡A ©ÎªÌ²ºÙ News¡C¥t¥~¦b Internet ¤W¡AÁÙ¦³¤@­Ó´X¥G¦P¼Ë±`¥Îªº¦WºÙ¬O NetNews¡A ²ßºD¤W USENET ©M NetNews ©w¸q½d³òµy¬°¦³¤@¨Ç¤£¦P¡A¤£¹L³oùاڭ̴N¤£¥[¥H°Ï§O¡A ¦Ó¥H USENET ¨Ó°µ¬°¥NºÙ¡CUSENET ªºµo®i¬O¥Ñ¹q¤l¶l¥ó¨t²Î (E-mail) ­l¥Í¦Ó¨Ó¡A ¸g¹L¦h¦~ªºµo®i¡A°£¤F«O¯d¤F E-mail ©Ò¾Ö¦³ªº§Ö±¶«K§Q¡A¥t¥~§ó¼W¥[¤F¶}©ñ©M°Q½× ªº­·®ð¡C¦p¤µ¤]¦³¤F±MÄݪººô¸ô¶Ç°e¨ó©w NNTP (NetworkNews Transfer Protocol)[2]¡A ¥H¤Îºô¸ô¤W¦¨¤d¤W¸U­Ó News server ºô¯¸¬Û¤¬¦X§@¡A¨C¤Ñ¤G¤Q¥|¤p®É¦a¹B§@¡A Ä~Äò¤£Â_¦a¬° Internet ¦U¦aªº¨Ï¥ÎªÌ¡A¥æ´«¤j¶qªº°T®§¡C

¡@¡@¤jÅé¤W¡AUSENET ±qµo®i¥H¨Ó¡A´N¬O¤@­Ó¦Û§Ú¬ù§ô³W½dªººô¸ô¬[ºc§ÎºA¡C ´«¥y¸Ü»¡¡A ¨Ï¥ÎªÌ¦b USENET ¤Wªº¦æ¬°¨Ã¨S¦³©ú½T©M¦³®Äªº³W½d¥[¥H­­¨î¡A¦Ó¬O¥Ñ³\³\¦h¦hªººÞ²zªÌ ©M¨Ï¥ÎªÌ¡A¦b¤¬«Hªº°ò¦¤U¡A¦@¦Pºû«ù USENET ªº¬[ºc©M¹B§@¡C

¡@¡@ÀHµÛºô¸ôªº³vº¥´¶¤Î¡A¦³¶V¨Ó¶V¦hªº¤£¦P¶¥¼hªº¨Ï¥ÎªÌ¶i¤J Internet¡A °£¦¹¤§¥~¡A¦]¬° USENET »P E-mailªº±K¤ÁÃö«Y¡A¦³¨Çºô¸ôÂsÄý¾¹¡A©Î E-mail ¬ÛÃöªº À³¥Îµ{¦¡¡A¤]¥[¤J¤FÂsÄý USENET ¤å³¹ªº¥\¯à¡AÀHµÛ¨Ï¥Î¤H¼Æªº¼W¥[¡B©M³nÅ骺´¶¤Î¡A ¤j²³¹ï©ó USENET ¨Ï¥Î¤Î±µ¨üªºµ{«×¤§°ª¥Ñ¦¹¥i¿s¨£¤@ºÝ¡C¦]¦¹¡A¬ÛÃö­l¥Íªº°ÝÃD¤] §ó­È±o§Ú­Ì­«µø¡C





2. ¤Ï¨îºô¸ôÀݥΠ(USENET SPAM) ªº¦æ¬°

¡@¡@¦­´Á USENET ªº¥D­n¨Ï¥ÎªÌ¦h¬°¾Ç³N¬ã¨s³æ¦ì¡A¹ï©ó¨Ï¥Î USENET ¦h¥b³£¦³¤@©wªº Àq«´»P³W½d¡C¦ý¬O¡Aªñ´X¨Ó¡Aºô¸ô¦b°Ó·~¤WªºÀ³¥Î¡A³y¦¨¤j¶qªº°Ó·~¥Î¤á©M¦æ¬°¥X ²{¦b USENET ¤W¡A¤j¶qªº¯Âºé¼s§i¡AªÅ¬}µL¤º®eµ¥µ¥ªº©U§£¸ê°T¡A¤]¶}©l¦bºô¸ô¤W¬y «¡A³o¼Ëªº±¡§Î¨Ã«D¥xÆW¿W¦³¡A¦Ó¬O Internet ¦@¦P­±Á{ªº°ÝÃD¡A¦b°ê¥~¤w¦³¤H°Q ½×³o¼Ëªº°ÝÃD[3]¡C®Ú¾Ú¥Ø«eÆ[¹î¡A¾ã­Ó USENET ¬y¶q¤¤¡A¤j¬ù 40% ¬O Spam¡A 40% «h¬O¹ïÀ³ªº control message¡A ³Ñ¤Uªº 20% ¤~¬O¯u¥¿§Ú­Ì©Ò»Ý­nªº¦³¥Îªº¸ê°T¡I

¡@¡@¥H©¹¹ï©ó³oºØ±¡§Î¡A³q±`¬O¥ÑºÞ²zªÌ°e¥X cancel message ¨Ó§R°£³o¼Ëªº¤å³¹¡A ÀHµÛºô¸ô¬y¶qªº¦¨ªø¡A³o¼Ëªº¤èªk¤w¤£¦A¾A¦X©ó²{¦bªº±¡ªp¡C¦]¦¹¡A±Ä¨ú§ó·s§ó¦³ ®Äªº¤Ï¨î¤èªk¬O§Ú­Ìªº¥D­n¬ã¨s¤è¦V¡A°£¦¹¤§¥~¡A§Ú­Ì¤]§Q¥Î log analysis ªº¬ÛÃö§Þ³N¡A ¹ï USENET ªº¬y¶q©Mµ²ºc¯S©Ê°µ¤@Æ[¹î©M¤ÀªR¡A°£¤F¥Î©óÀËÅç¤Ï¨î¦¨®Ä¥~¡A ¤]¥i´£¨Ñ¬ÛÃö¼Æ¾Ú¸ê®Æ°µ¬° USENET ºÞ²zªÌ¤é«á¤§§Þ³N°Ñ¦Ò¸ê®Æ¡C

¡@¡@®Ú¾Ú¬ÛÃö¸ê®Æ²Î­p¡A¥h¦~¡£1997¡¤USENET ¤WªºÁ`¬y¶q¤j¬ù¦b 5Tbytes~15Tbytes ¤§¶¡¡A¦Ó¹ï©ó¤@¥x¦¬¯Ç¥þ³¡°Q½×°Ï¡£full feed¡¤ªº news server ¦Ó¨¥¡A ¨ä¤@¤ÑÁ`¬y¶q¬ù¬° 10Gbytes¡A«ö·Ó¨ä©Ò¦¬¤§°Q½×°Ï¤£¦P©Mºô¸ôÀW¼e¡A¦¹¼Æ¦r·|¦³¤£¦P¡C

¡@¡@®Ú¾Ú www.twnic.net¡£http://power2.nsysu.edu.tw/ipdomain/DNS/History.html¡¤¤W¡Aserver hosts ¼Æ¥Øªº²Î­p¡A¨ì 1998/8/1 ¤é¡A »OÆWªº News server host ¼Æ¥Ø¤w¸g¶W¹L 570 »O¡C ¥t¥~¡M ¥h¦~ [1997] ¥|¤ëªº°O¿ý¡A ¾ãÅé TANet ¹ï Internet ªº¬y¶q¡A²Ä¤@¦W¬O WWW¡A¬ù¦û 45%¡A²Ä¤G¦W¬O News ¬ù¦û 29% (¾Ú¤F¸Ñ¡A¤j¬ù±q 1998 ¦~°_¡ATANet ©¹ Internet¡AWWW ¬y¶q©¹¤WÃk¤É¨ì¤j¬ù 60%) ¡C¥Ø«e¡A»OÆW¾Ç³Nºô¸ôªñ¨Ó¦]¨ü­­©óÀW¼e¤£¼Å¨Ï¥Î¡A¥H¥Ø«e¨â±ø T1 ¹ï¬ü°êªººô¸ô¡A ¹ê»Ú¤W¶Ç¶i°ê¤ºªº USENET News ªº¶q¡A¨C¤Ñ¤j¬ù¦b 350MB - 700MB ¤§¶¡¡A Â÷¾ã­Ó Internet ¬y¶Çªº¶qÁÙ¬Û®t«Ü»·[4]¡C¦¹¥~¡A§Ú­Ì¨Ã°w¹ï¥H»OÆW¦a°Ï¬°¥D ¦h¼Æ¨Ï¥ÎÁcÅ餤¤åªº tw.* °Q½×°Ïªº¬y¶q°µ¹L¤ÀªR»P²Î­p [5]¡Atw.* °Q½×¸s²Õ¡A ¤@¤ÑÁ`¬y¶q¤j¬ù¦b 20M bytes¢w30M bytes¡A¤@¤Ñªº¤å³¹½g¼Æ¤j¬ù¬°¤G¸U½g¡A»P¨ä ¥L¥D­n°Q½×°Ï ¡£alt.*¡Anews.*¡Arec.*¡K¡Ketc¡¤¤ñ¸û¡A¦h¥b±Æ¦W«e¤­¥H¤º¡C

¡@¡@¥Ñ¥H¤W²Î­p¼Æ¦r¬Ý¨Ó¡AUSENET ¤@¤Ñ©Ò²£¥Íªº¬y¶q¡A¦bºô¸ôÀ³¥Î¤W¬O§êºt¤F Á|¨¬»´­«ªº¦a¦ì¡A¦Ó tw.* ªº¤¤¤å°Q½×°Ï¡£¤]¥i»¡¬O¥þ¥@¬É³Ì¤j¤§¤¤¤å°Q½×°Ï¡¤ ¤]¬O¤£¥i©¿²¤ªº¡A¦]¦¹¡A¤Ï¨î USENET ¤§ÀݥΡA¥i»¡¬O·í°È¤§«æ¡C

¹Ï¤@¡B©Ò¦³°Q½×°Ï¤å³¹½g¼Æ²Î­p¹Ï ¹Ï¤G¡Btw.bbs.* °Q½×°Ï¤å³¹½g¼Æ²Î­p¹Ï



¹Ï¤T¡B©Ò¦³°Q½×°Ï¤å³¹Á`¬y¶q²Î­p¹Ï ¹Ï¥|¡Btw.bbs.* °Q½×°Ï¤å³¹Á`¬y¶q²Î­p¹Ï

¥H¤W¬O¦b 1998/09/14 ©ó spring.edu.tw ©Ò°µªº²Î­p¸ê®Æ¡C



2.1 USENET SPAM ªº±`¨£¤ÀÃþ

¡@¡@¤@¯ë»¡¨Ó¡A¹ï©ó USENET ¤W¨º¨Ç³Qµø¬°©U§£¸ê°Tªº¤å³¹¡A §Ú­Ì³q±`ºÙ©I¥¦­Ì¬° Spam¡F§ó¨ãÅ骺»¡ªk´N¬O ¡A¦P¤@¨Ï¥ÎªÌ¸g¥Ñ¦P¤@ server ¥D¾÷¡A¦b«Üµuªº®É¶¡¤º ¤j¶q¦a°e¥X¤º®e´X¥G§¹¥þ¬Û¦Pªº¤å³¹¡C °ò¥»¤W¡A¦pªG±q¼s¸qªº¨¤«×¨Ó¬Ý, ³oùةһ¡ªº Spam¡A«ü±o¬O¤U¦C´XÃþ±¡§Î¡G

  1. ¹H¤Ïµ½¨}­·«U


  2. ¹H¤Ï¾Ç³Nºô¸ô³]¥ß©v¦®


  3. ®ö¶Oºô¸ôÀW¼e

¡@¡@±q§Þ³N¼h­±¨Ó¹ï Spam ¥[¥H¤ÀªR¡A«h¥i¥H³W¯Ç¥X¥H¤U´XºØÃþ§O¡F

  1. binary post

  2. multipost¡£¦h¦¸­«¶K¡¤

  3. crosspost¡£¥æ¤e±i¶K¡¤

  4. ¨ä¥¦¤£¾A·í¤§¤å³¹





3. ¦p¦ó¶i¦æ¤Ï¨îÀݥΠUSENET

¡@¡@¨Æ¹ê¤W¡A³B²z³o¨Ç°ÝÃD±Ä¥Îªº¤èªk¡A¥i¥H¤j­P¤À¬°¨âºØ¤è¦¡

  1. canceling - µo¥X control message §R°£¤å³¹


  2. filtering - ¥D°Ê¹LÂo

¡@¡@¦­´Á³B²zªº¤è¦¡¦h¥b¬O¥Ñ USENET ¬ÛÃöªººÞ²zªÌµo¥X control message ¨Ó§R°£¦¹ ¤@¯S©wªº¤å³¹¡A¨Ã§Q¥Îµ{¦¡¨Ó§PÂ_¨ä¬O§_¬° Spam¡A¤£¹L¡A¨Ï¥Î³oÃþªº³B²z¤èªk¦³ ¥H¤Uªº°ÝÃD¡F

  1. ®ö¶Oºô¸ôÀW¼e

    ¨Ì¾Ú RFC µ¥¬ÛÃö¸ê®Æ¡Acontrol message »P³Q§R°£¤§¤å³¹ ¦³¤@¹ï¤@¤§¹ïÀ³Ãö«Y¡A´«¥y¸Ü»¡¡A­n§R°£¤@½g¤å³¹¥²¦³¤@½g¹ïÀ³¤§ control message¡A ¦P®É¡Acontrol message ¥²©w¬O¦b Spam ²£¥Í¤§«á¤~±o¥Hµo¥X¡A¦¹®É¡ASpam ¦h¥b¤w¦bºô¸ô ¤W³Q¤j¦h¼Æªº news server ¦¬¨ú¡A¦]¦¹¡A­Y±qºô¸ôÀW¼eªº¨¤«×¨Ó¦Ò¶q¡ASpam »P control message ªº¬y¶qªñ¥G¬Ûµ¥¡A¹ï©óºô¸ôÀW¼eªº¸`¬ÙÀ°§U¨Ã¤£¬O«Ü¤j¡C


  2. ¯Ó¶O¥D¾÷¸ê·½

    ¥¿¦p¤W©Ò­z¡A¨Ì¾Ú¨ä¤@¹ï¤@¤§¹ïÀ³Ãö«Y¡A§Y¨Ï Spam ¤w³Q§R°£¡A ¤£¥e¥Î¥D¾÷ªÅ¶¡¡A¦ý¬O§R°£Spam©Ò»Ý¤§ control message «o¥e¥Î¤F¥D¾÷ªºªÅ¶¡¡A ¹ï©óªÅ¶¡ªº´î¤Ö¨Ã¨S¦³«Ü¤jªºÀ°§U¡A§ó­«­nªº¬O¡A¹ï¥Ø«eªº news server ¨Ó¬Ý¡A ±µ¦¬ control message §R°£¤å³¹¬O¹ï©ó¥D¾÷ªº®Ä¯à´£¤É¬O¦³­t­±¼vÅTªº¡A¼W¥[¤F µwºÐªº¦s¨ú¦¸¼Æ¡A¤]ªá¶O¸û¦hªº³B²z®É¶¡¡C

¡@¡@¥H©¹¨¾¨î Spam ªº¾÷¨î¡A¦h¥b¬O±Ä¥Î¦¹ºØ¤èªk¹F¦¨¡A¹ï©ó Spam ªº¨¾¨î¤]²£¥Í¤F¤@ ©wªº¦¨®Ä¡A¤£¹L¡AÀHµÛ¬y¶qªº¼W¥[¡Aºô¸ôªº¨Ï¥Î²ßºD»P¦æ¬°¤]§ó¬°¦h¼Ë¤Æ¡A¶Ç²Îªº ¤èªk¬Û¸û©ó²{¦b¡A´X¥G¬O¤£¼Å¨Ï¥Î¡A±Ä¥Î§ó·s§ó¦³®Ä²vªº¤èªk¨Ó¨¾¨î Spam ¤]¬O¥¼¨Óªº ÁͶաC





4. USENET Antispam - News filtering


4.1¡BNews filtering ªº¹B§@­ì²z

¡@¡@¥»¶µ¬ã¨s©Ò¥D­n±Ä¥Îªº¤èªk¡A´N¬O¦b news server ¦w¸Ë¥D°Êªº¹LÂoµ{¦¡¡A ¹ï¤å³¹¬ÛÃö¸ê°T°µ¤@§P§O¡A¨Ã¨M©w³B²z¤è¦¡¡F±Ä¥Î¦¹¤è¦¡ªºÀuÂI¥i±q news server ¤§¶¡³s±µªº¬[ºc¨Ó°Q½×¡C


¹Ï¤­¡B¦w¸Ë filter ¤§ innd ¤u§@¬yµ{¹Ï

¡@¡@°ò¥»¤W¡AUSENET ªº°T®§§¡¥Ñ news server ¤¬¬Û³s±µ¨ÃÂà°e¡£relay¡¤ ¥H¹F¨ì¨ä¥Øªº¡A news server ¥D­n¥i¤À¬°¨â¤j¥\¯à¡A¨ä¤@¬°Âà°e USENET ¤W©Ò¦³ªº°T®§µ¹¨ä¥Lªº news server¡A¥t¤@«h¬O´£¨Ñ¨Ï¥ÎªÌ¾\ŪÂ^¨ú©Ò»Ýªº°T®§¡A¦]¦¹¡A¦b news server ¤W¦w¸Ë filter ±N¥iµo´§ÃöÁä©Êªº¼vÅT¡A¥i¨Æ¥ý¹LÂo¥X³Qµø¬° Spam ªº¤å³¹¡A ³o¨Ç¤å³¹´N¤£¦A»Ý­n³Q¶Ç»¼µ¹¨ä¥Lªº news server¡A¤]¤£·|³QÀx¦s¤U¨Ó¡A²³æ¦a»¡¡A´N¬O

©U§£¤£¸¨¦a¡I¡I

¡@¡@¥Ø«e±`¨£ªº news server ¨t²Î³nÅé[6]¡A¤j³¡¥÷§¡¤w¶}©l´£¨Ñ¦¹¶µ¥\¯à¡A ºÞ²zªÌ¥i¥H¨Ì·ÓºÞ²z¤Wªº¦UºØ»Ý¨D¡A³Wµe¥X filter ©Ò»Ý­nªº rule set¡A ¦Ónews server´N¨Ì·Ó³o¨Çrule set¨Ó§PÂ_¤å³¹¬O§_­n³QÂo±¼¡A¨Æ¹ê¤W¡A ¥H¥Ø«e¦UºØ news server µo®iªºÁͶըӬݡA©Ò¿×ªº filter¡A³q±`¬O¤@ºØÂ²³æªºµ{¦¡»y¨¥¡A ¨Ò¦p¡Gperl¡Btcl µ¥µ¥¡AºÞ²zªÌ¥i¥H¦b¤@©wªº®æ¦¡¤§¤U¦Û©w filter §PÂ_ªº¤è¦¡¡A ¦Ó news server ±Nª½±µ©I¥s filter ¨Ó§PÂ_¤å³¹¡A¦Ü©ó¨Ï¥Î perl µ{¦¡ªº¥D­n­ì¦]¡A ¤]¬O¦]¬°¨ä¼¶¼g¤è«K¡A¨Ã¥B¹ï©ó¤å¦r³B²z¦³µÛ¸û±jªº¯à¤O¡A²¦³º¡A¥Ø«e USENET ¤Wªº¥D ­n¤å³¹¤å³¹¤º®e¤´µM¬O¥H¤å¦r¬°¥D¡C

¡@¡@¥H¤U§Ú­Ì¥H¥Ø«eºô¸ô¤W¬Û·í´¶¤Îªº¤@­Ó filter µ{¦¡ cleanfeed [7]¡A ·f°t News Server ¨t²Î³nÅé INN [8] ¨Ó°µ¬°,¤¶²Ð¤@ ¯ë news filtering µ{¦¡¡M ¤º³¡ªº³]­p»P¹B§@­ì²z¡C


¹Ï¤»¡Bcleanfeed¹B§@¬yµ{

¡@¡@¤å³¹³Q innd ±µ¦¬«á¨ä¬ÛÃöªº header »P body ¸ê®Æ´N¶Çµ¹ filter¡Afilter ­º¥ý·|¨Ì¾Ú«e­±©Ò´£ªº´XºØ³W«h«Ø¥ß¹ïÀ³ªº key ¨Ã¦s¤J hash table ¤¤¥H°µ¼Æ¶q²Î­p¡A ±µ¤U¨Ó¡A¦A¥H³o¨Ç key »P hash table ¤¤ªº¸ê®Æ¤ñ¹ï¡A¦pªGµo²{¤å³¹¼Æ¶q¥H¶W¹L¤W­­¡A ´Nª½±µ¶Ç¦^µ¹ innd¡A¨Ï¨ä©Úµ´¦¹½g¤å³¹¡A¤Ï¤§¡A«hÄ~Äò¶i¦æ¨ä¥LÀˬd¡A ¦b§¹¥þ³q¹L filter ªºÀˬd«e¡A­Y¦³¥ô¤@ÀˬdµLªk³q¹L¡A«h©Úµ´¸Ó½g¤å³¹¡A ­Y¤å³¹¥i¥H³q¹L©Ò¦³ªºÀˬd¡A«h±µ¤U¨Ó¥æµ¹ innd ³B²z»P¤@¯ë¤å³¹ªº³B²z¤è¦¡§¹¥þ¬Û¦P¡C ¡A¤j­P¤W¨Ó»¡¡Afilter §PÂ_¤å³¹ªº¤è¦¡¥D­n¥i¥H¤À ¦¨¥H¤U¨â¤jºØ¡G

  1. ¥H¼Æ¶q§PÂ_
  2. ¥H¯S©w¦r¦ê§PÂ_



4.2 News filtering ¨t²Î¬[ºc

¡@¡@¥»¬ã¨s±Ä¥Î INN ¬°¥D­n´ú¸Õªº¹ï¶H¡A¥D­n­ì¦]¬O¥Ø«e¤j¦h¼Æ ªº news server ºÞ²zªÌ ±Ä¥Î INN ¨Ó¬[³] news server¡A¦P®É INN ªº filter ¥\¯à¤]±q«Ü¦­´N ¶}©lµo®i¤F¡C±µ¤U¨Ó¡A §Ú­Ì´N¥H INN °µ¹ê¨Ò¨Ó»¡©ú filter ¹ê»Ú¹B§@ªº­ì²z¡CINN ¥Ø«e´£¨Ñ perl hook »P tcl hook¡A ²³æªº»¡¡A ºÞ²zªÌ¥i¥H§Q¥Î³o¨âºØ»y¨¥»s°µ filter¡A¦Ó¥Ø«e¤S¥H¨Ï¥Î perl ¬°¸û¦h¼Æ¡C¨Æ¹ê¤W¡A ©Ò¿×ªº filter¡A¨ä¹ê´N¬O¤@­Ó script(perl or tcl)¡Ainnd ¦¬¨ì¤å³¹«á¡A ·|±N¤å³¹ªº header ©M body µ¥¬ÛÃö¸ê®Æ¶Çµ¹ filter¡Afilter ¦A¨Ì¾Ú³o¨Ç¸ê®Æ¨Ó§PÂ_¤å ³¹¬O§_¬° Spam ¨Ã¥B©Ú¦¬¡C

¡@¡@©Ò¦³ªº¤å³¹¡A¦b³q¹L innd ³ÌªìªºÀˬd¦Ó³Q±µ¨ü¥H«á¡A´N·|¥æµ¹ filter ¥h§PÂ_¡A innd ·|Â^¨ú¤å³¹ªº¬ÛÃö¸ê®Æ¡£header¡Bbody¡¤¡A¶Çµ¹ filter ³B²z¡Afilter «K¨Ì·Ó³o ¨Ç¸ê®Æ¨Ó°µ¥X§PÂ_¨Ã¦^À³µ¹ innd¡Ainnd «h¨Ì·Ó filter ªº¦^À³¨Ó¨M©w¦¹¤å³¹ªºªº³B²z ¤è¦¡¡A¦pªG¦¹¤å³¹³q¹L filter ªºÀˬd¡A«h»P¤@¯ë innd ³B²z¤å³¹ªº¤è¦¡¬Û¦P¡A¤]´N¬O ¼g¤JµwºÐ¡A¦P®É°eµ¹¤U´å±µ¦¬ªº news server¡F¦pªG¦¹¤å³¹µLªk³q¹L filter ªºÀˬd¡A ¨º»ò¡Ainnd ¤£·|±N¦¹¤å³¹¼g¤JµwºÐ¤¤¡A¤]¤£·|¶Ç°eµ¹¤U´åªº news server¡Ainnd ¹ï©ó ¦¹ºØ¤å³¹³Ì«áªº³B²z¤è¦¡´N¬Oª½±µÄÀ©ñ¸Ó¤å³¹¦b°O¾ÐÅ餤©Ò¦ûªº¦ì¸m¡A¨Ã¯d¤U¬ÛÃöªº °O¿ý¡A´«¥y¸Ü»¡¡A³Q¹LÂo±¼ªº¤å³¹±N¤£·|¦û¾ÚÀx¦sªÅ¶¡¡A¤]¤£·|¦û¥Îºô¸ô¬y¶q¡C ¦¹¥~¡A¦p¹Ï¤­¡AÁöµM¦³¦w¸Ë filter¡A¦ý¨Æ¹ê¤W¤´µM·|¦³¤p³¡¥÷ªº Spam ³q¹LÀˬd¡A ³o¤@¤p³¡¥÷ªº¤å³¹¡A¥i¥HºÙ¬°²z½×®e³\­È¡C

¡@¡@³Q filter ¹LÂo¥X¦Ó³Q©Úµ´ªº¤å³¹¨ä Message-ID ¤]·|³Q°e¶i innd ªº history database ¤¤¡A¦pªG¦P¼Ëªº¤å³¹³Q¨ä¥L news server °e¶i¨Ó®É¡A¤£¥Î¸g¹L filter ¦A¦¸³B²z´N¥i¥Hª½ ±µ©Úµ´¡A¹ï©ó¹ïÀ³ªº control message¡Afilter ¤]¯à¸ò¾Ú hash table ¤¤ªº¸ê®Æ¡A ª½±µ¥[¥H©Úµ´¡C



4.3 ¨ä¥¦¤èªk

¡@¡@°£¤F¥D°Ê¦¡¹LÂoªº¤è¦¡¤§¥~¡A¥Ø«e¤]¦³¨ä¥L¤èªk¡A¨ä¤¤¤@ºØ¬O§ï¨}²{¦³±Ä¥Î control message §R°£¤å³¹ªº¤è¦¡¡AºÙ¤§¬° NoCeM [9]¡A°ò¥»¤W¡A¨ä¥D­nºë¯«ÁÙ¬O¥H ¨Æ«á§R°£ªº¤è¦¡¬°¥D¡C¡£½Ð°Ñ¦Ò http://www.cm.org/¡¤





5. ¹ê§@¨t²Îªº®Ä¯à¤ÀªR

¡@¡@§Ú­Ì©M°ê¤º´X­Ó¥D­nªº news Âà°eºô¯¸¦X§@¡A¦b©³¤U´X­Ó¤£¦Pªº Usenet news server ¤W¡A¥[¸Ë¤F³o®M news filtering ¨t²Î¡A¦P®É°O¿ý»P»`¶°¦U¨t²Î¡A±q 1998 ¦~¤¸¤ë¥H ¨Ó¡A¨C¤é¨t²Î±µ¦¬»PÂà°e news articles ªº±¡ªp¡C¦³¿³½ìªÌ¡A ¥i¦Û¦æ°Ñ¦Ò©³¤U¦U­Ó¨t²Îªº°O¿ý¡C


¹Ï¤C¡B°ê¤º news server ¹ï±µÃö«Y¹Ï

¡@¡@¨ä¤¤¡Aspring.edu.tw ¬O¥Ø«e TANet ¤W USENET News ¥D ­nªº backbone server ¤§¤@¡A¤]¬O TANet ¹ï°ê¥~ News feeding ªº¶i¥Xµ¡¤f¡A©M HiNet (serv.hinet.net)¡ASEEDNet (feeder.seed.net.tw) µ¥ ´X­Ó news Âà°e¨t²Î¡A¤]³£¦³³s½u¡C news.edu.tw¡Bnews-peer.nctu.edu.tw «h¤À§O¬°±Ð¨|³¡¹qºâ¤¤¤ß»P¥æ³q¤j¾Ç¹qºâ¤¤¤ß ªº news Âà°e¨t²Î¡C ©³¤U¬O ¤@­Ó³¡¥÷ªº log °O¿ýªº sample¡C¡£°O¿ý®É¶¡ 1998/09/12 - 1998/09/13¡¤

²Î­p¥D¾÷ ²Î­p®É¶¡¦¬¨ú¤å³¹½g¼Æ ¦¬¨ú¤å³¹Á`¶q
spring.edu.tw Sep 12 02:49 - Sep 13 02:49¡£1998¡¤ 225199307.4 Mb

ReasonCount
EMP rejected (ph/l) 29407
EMP rejected (md5) 12166
Scoring filter 3343
Too many newsgroups 2256
Binary in non-binary group 1482
New EMP detected (md5) 382
Org/MID bot pattern 360
Excessive Supersedes - 209.180.249.235 304
UUencoded htm 149
UUencoded html 148
New EMP detected (ph/l) 121
Poison newsgroup 97
Angle-bracket bot 87
AtomicPost 43
EMP rejected (f/s/l) 41
Email Platinum 30
Spam - pictureview.com 30
Unwanted ihave message 30
Spam - www.xxx-young.com 25
2.0.x Bot 22
TOTAL: 48 50691

ªí¤@¡Bcleanfeed ¹LÂo¤å³¹µ²ªG²Î­p

¡@¡@¥Ñ¤Wªí¤@¤¤¥i¥Hµo²{¡A¤j¦h¼Æ³Q¹LÂo±¼ªº¤å³¹¥D­n¬OEMP rejected (ph/l) »P EMP rejected (md5)³o¨âÃþ¡A³q±`³oÃþ±¡§Î¥Nªí¥i¯à¬O¦P¤@¨Ï¥ÎªÌ¸g¥Ñ¦P¤@ server ¥D¾÷¡A µu®É¶¡¤º¤j¶q¦a°e¥X¤º®e´X¥G§¹¥þ¬Û¦Pªº¤å³¹¡C

¡@¡@¥Ñ©ó³o´X­Ó¨t²Î©Ò³s±µªº¤U´å¨t²Î¡A©¼¦¹¤§¶¡¤]¦h¦³©Ò³s±µ¡C¦]¦¹¡A ¹ï©ó news filtering ªº®ÄªG¡A¤£©ö¨ú±oºë½Tªº¼Æ¦r¡C¦]¦¹¡A §Ú­Ì¤]¯S§O§ï¼g¤F filtering µ{¦¡ªº³¡¥÷¤ù¬q¡A±N³o¤@¨Ç SPAM ªº¥D­n header ¤ù¬q¡A¥t ¥~ log ¤U¨Ó¥[¥H¤ÀªR¡C¥t¥~¡A¹ï©ó§Ú­Ì¥»¤gªº tw.* ³o¤@³¡¥÷¡AÁÙ¦n¨ä¤¤ªº serv.hinet.net ©M netnews.hinet.net ¬° ³æ¤@ªº³s±µµ¡¤f¡CHiNet ¬°¥Ø«e°ê¤º³Ì¤jªº°Ó·~ ISP¡A¶W¹L 50 ¸U¤H¥H¤Wªº¼·±µ¨Ï ¥Î¤á¡A§ó¬O tw.bbs.* ªº¤j¤O°Ñ»P ªÌ¡C¥­§¡¨C¤Ñ±q netnews.hinet.net °e¥X 2500-4000 «Ê¡A¥D­n¬O tw.* ªº ¤¤¤å articles¡C®Ú¾Ú°lÂÜ¡A¤j¬ù¦³¤T¤À¤§¤@¥H ¤W¡A¬ù 800-1300 «Ê articles ³QÄd¤U¨Ó¡A¹ï©ó ISP µL¤OºÞ²z¨Ï¥Î¤áÀݵo¼s§i¦æ¬°¡A ³o¤@³¡¥÷¤]¥¿¬O³¡¥÷¨Ï¥ÎªÌ¡AÀÝ ±i¶K¼s§i¨ì¦U­Ó¤£¦P°Q½×¸s²Õªº©úÅã¼g·Ó¡C

¡@¡@¥t¤@¤è­±¡A±N¦U¹êÅ篸©Ò°O¿ý¤§µ²ªG°µ¤@¥æ¤e¤ñ¹ï¤Î¤ÀªR¤§«á¡A§Ú­Ì¤]¥i¥Hµo²{ ¥Ñ©ó²{¤µ news server ³q±`³£¦³¦h­Ó¤å³¹±µ¦¬¨Ó·½¡A¦]¦¹¡A­n¤Ï¨î³o³\¦hÃþ SPAM ªº´²¼½¡A«K»Ý­n¦U­Ó¬ÛÃöªº news server ±K¤Á°t¦X¡A¤~¯àµo´§¹ê»ÚÀ³¦³ªº¥\®Ä¡C





6. µ²½× - ¥¼¨Óµo®i»P±À¼s

¡@¡@¥»¬ã¨s¤¤¿ï¾Üªº¥D­n´ú¸Õ¹ï¶H¥D­n¬°±µ¦¬°ê¤º¥~¥D­n¤§°Q½×°Ï¡A ©Î¬O»P°ê¥~ news server ¹ï±µ¤§ news server¡A³o¨Ç news server ªº¬y¶q³q±`³£¬Û·í¤j¡A ¦Ó¥B¤]¦³¸û¦h¹ï±µ¤§¥D¾÷¡A¦Ó¤j¬y¶qªº´ú¸ÕÀô¹Ò¥i¥H©úÅ㪺µo²{ Spam ¹ï¾ã­Ó¨t²ÎÀô¹Òªº ¼vÅT¡A¤]¥i¨ú±o§ó¦h¸û¬°«ÈÆ[ªº¼Æ¾Ú©Î¼Ë¥»¥H¨Ñ¤ÀªR¡C

¡@¡@¾ãÅé¨Ó»¡¡A¨Ï¥Î news filtering¡A¨Ó¨¾¨î USENET Spam ¦³¤U¦C´X¶µÀuÂI¡G

  1. ¸`¬Ùºô¸ôÀW¼e
  2. ´£¤É¥D¾÷®Ä²v
  3. ¦w¸Ë filter ¹ï¥D¾÷®Ä²v¤§¼vÅT

Code regionTimePctInvokedMin(ms)Avg(ms)Max(ms)
article cancel00:00:00.011 0.0%3241 0.000 0.003 0.111
article control00:00:00.990 0.0%4382 0.000 0.226 2.769
article link00:00:00.000 0.0%0 0.000 0.000 0.000
article write00:00:58.849 0.1%141454 0.272 0.416 1.159
history grep00:00:00.000 0.0%0 0.000 0.000 0.000
history lookup00:06:31.850 0.5%793760 0.013 0.494 16.475
history sync00:00:45.115 0.1%28487 0.040 1.584 10.132
history write00:22:41.331 1.6%152458 1.592 8.929 75.924
idle22:43:03.917 95.8%764791 53.791 106.936 213.576
perl filter00:15:35.884 1.1%152426 4.234 6.140 6.845
site send00:03:28.102 0.2%425369 0.340 0.489 0.614
TOTAL: 23:42:36.90723:33:06.049 99.3%----

ªí¤G¡BINND¦U¥D­n³B²zµ{§Ç©Ò»Ý®É¶¡¤ñ¸ûªí

¡@¡@¤j¬ù¤µ¦~²Ä¥|©u¶}©l¡ATANet ¹ï°ê¥~ªº T3 ±µ³q¤§«á¡A©¡®É USENET ¬y³qªº¶q¡A ¥²¤j´TÃkª@¡A¹ï©ó USENET SPAM ±N±a¨Óªº¼vÅT¡A§ó¥²¶·¥õ¿à¤j®a¤p¤ß¥[¥H¦]À³¡C¦pªG¯à±N³o®MÃþ¦üªº news filtering ¨t²Î±À¼s¦U³æ¦ìªº News server ¤W¡A¨º»ò¹ï©ó¾ã­Ó TANet¡A¬Æ¦Ü¾ã­Ó»OÆWªº USENET SPAM impact¡A À³¸Ó¯à´î¨ì³Ì¤pªº¼h­±¡AÅý±N¨Ó¾ãÅéºô¸ô®Ä¯à¡A¾¨¶q±©«ù¦b¤@­Ó¦X²zªº¨Ï¥Î½d³ò¡C



°Ñ¦Ò¤åÄm

  1. M. Horton; R. Adams; " Standard for Interchange of USENET Messages"; RFC 1036¡MDecember 1987


  2. Brian Kantor; Phil Lapsley; " Network News Transfer Protocol - A Proposed Standard for the Stream-Based Transmission of News" ; RFC 977¡M February 1986


  3. Ãö©ó¤Ï¨îºô¸ôÀݥΡAhttp://spam.abuse.net


  4. TANet ¨C¤éªº USENET News ¬y¶q²Î­p¡Ahttp://spring.edu.tw:11180/~news/


  5. ¦U°Q½×°Ï¨C¤é¬y¶q²Î­p¡Ahttp://news-peer.nctu.edu.tw


  6. ±`¨£ªº News Server Software

  7. ¹LÂoµ{¦¡ cleanfeed ªº­ì©l¯¸¡A http://www.exit109.com/~jeremy/news/antispam.html


  8. Ãö©ó INN ³o®M News server software¡A http://www.isc.org/inn.html


  9. Ãö©ó NoCeM ³o®M³nÅé¡Ahttp://www.cm.org