public inbox for gentoo-commits@lists.gentoo.org
 help / color / mirror / Atom feed
From: "Zac Medico" <zmedico@gentoo.org>
To: gentoo-commits@lists.gentoo.org
Subject: [gentoo-commits] proj/portage:master commit in: lib/portage/tests/ebuild/, lib/portage/package/ebuild/
Date: Mon, 22 Feb 2021 12:18:41 +0000 (UTC)	[thread overview]
Message-ID: <1613994521.b9ef191c74982b0e8d837aa7dd256dc3c52f7d2c.zmedico@gentoo> (raw)

commit:     b9ef191c74982b0e8d837aa7dd256dc3c52f7d2c
Author:     Zac Medico <zmedico <AT> gentoo <DOT> org>
AuthorDate: Sat Feb 20 23:11:46 2021 +0000
Commit:     Zac Medico <zmedico <AT> gentoo <DOT> org>
CommitDate: Mon Feb 22 11:48:41 2021 +0000
URL:        https://gitweb.gentoo.org/proj/portage.git/commit/?id=b9ef191c

MirrorLayoutConfig: content digest support (bug 756778)

In order to support mirror layouts that use content
digests, extend MirrorLayoutConfig validate_structure and
get_best_supported_layout methods to support an optional
filename parameter of type DistfileName which includes a digests
attribute. Use the new parameter to account for availablility
of specific distfile content digests when validating and selecting
mirror layouts which require those digests.

The DistfileName type represents a distfile name and associated
content digests, used by MirrorLayoutConfig and related layout
implementations.

The path of a distfile within a layout must be dependent on
nothing more than the distfile name and its associated content
digests. For filename-hash layout, path is dependent on distfile
name alone, and the get_filenames implementation yields strings
corresponding to distfile names. For content-hash layout, path is
dependent on content digest alone, and the get_filenames
implementation yields DistfileName instances whose names are equal
to content digest values. The content-hash layout simply lacks
the filename-hash layout's innate ability to translate a distfile
path to a distfile name, and instead caries an innate ability
to translate a distfile path to a content digest.

In order to prepare for a migration from filename-hash to
content-hash layout, all consumers of the layout get_filenames
method need to be updated to work with content digests as a
substitute for distfile names. For example, in order to prepare
emirrordist for content-hash, a key-value store needs to be
added as a means to associate distfile names with content
digest values yielded by the content-hash get_filenames
implementation.

Bug: https://bugs.gentoo.org/756778
Signed-off-by: Zac Medico <zmedico <AT> gentoo.org>

 lib/portage/package/ebuild/fetch.py    | 98 ++++++++++++++++++++++++++++++----
 lib/portage/tests/ebuild/test_fetch.py | 33 +++++++++---
 2 files changed, 114 insertions(+), 17 deletions(-)

diff --git a/lib/portage/package/ebuild/fetch.py b/lib/portage/package/ebuild/fetch.py
index e0fecaf23..af9edd91e 100644
--- a/lib/portage/package/ebuild/fetch.py
+++ b/lib/portage/package/ebuild/fetch.py
@@ -1,4 +1,4 @@
-# Copyright 2010-2020 Gentoo Authors
+# Copyright 2010-2021 Gentoo Authors
 # Distributed under the terms of the GNU General Public License v2
 
 __all__ = ['fetch']
@@ -344,6 +344,57 @@ _size_suffix_map = {
 }
 
 
+class DistfileName(str):
+	"""
+	The DistfileName type represents a distfile name and associated
+	content digests, used by MirrorLayoutConfig and related layout
+	implementations.
+
+	The path of a distfile within a layout must be dependent on
+	nothing more than the distfile name and its associated content
+	digests. For filename-hash layout, path is dependent on distfile
+	name alone, and the get_filenames implementation yields strings
+	corresponding to distfile names. For content-hash layout, path is
+	dependent on content digest alone, and the get_filenames
+	implementation yields DistfileName instances whose names are equal
+	to content digest values. The content-hash layout simply lacks
+	the filename-hash layout's innate ability to translate a distfile
+	path to a distfile name, and instead caries an innate ability
+	to translate a distfile path to a content digest.
+
+	In order to prepare for a migration from filename-hash to
+	content-hash layout, all consumers of the layout get_filenames
+	method need to be updated to work with content digests as a
+	substitute for distfile names. For example, in order to prepare
+	emirrordist for content-hash, a key-value store needs to be
+	added as a means to associate distfile names with content
+	digest values yielded by the content-hash get_filenames
+	implementation.
+	"""
+	def __new__(cls, s, digests=None):
+		return str.__new__(cls, s)
+
+	def __init__(self, s, digests=None):
+		super().__init__()
+		self.digests = {} if digests is None else digests
+
+	def digests_equal(self, other):
+		"""
+		Test if digests compare equal to those of another instance.
+		"""
+		if not isinstance(other, DistfileName):
+			return False
+		matches = []
+		for algo, digest in self.digests.items():
+			other_digest = other.digests.get(algo)
+			if other_digest is not None:
+				if other_digest == digest:
+					matches.append(algo)
+				else:
+					return False
+		return bool(matches)
+
+
 class FlatLayout:
 	def get_path(self, filename):
 		return filename
@@ -439,19 +490,36 @@ class MirrorLayoutConfig:
 		self.structure = data
 
 	@staticmethod
-	def validate_structure(val):
+	def validate_structure(val, filename=None):
+		"""
+		If the filename argument is given, then supported hash
+		algorithms are constrained by digests available in the filename
+		digests attribute.
+
+		@param val: layout.conf entry args
+		@param filename: filename with digests attribute
+		@return: True if args are valid for available digest algorithms,
+			and False otherwise
+		"""
 		if val[0] == 'flat':
 			return FlatLayout.verify_args(val)
-		if val[0] == 'filename-hash':
+		elif val[0] == 'filename-hash':
 			return FilenameHashLayout.verify_args(val)
 		return False
 
-	def get_best_supported_layout(self):
+	def get_best_supported_layout(self, filename=None):
+		"""
+		If the filename argument is given, then acceptable hash
+		algorithms are constrained by digests available in the filename
+		digests attribute.
+
+		@param filename: filename with digests attribute
+		"""
 		for val in self.structure:
-			if self.validate_structure(val):
+			if self.validate_structure(val, filename=filename):
 				if val[0] == 'flat':
 					return FlatLayout(*val[1:])
-				if val[0] == 'filename-hash':
+				elif val[0] == 'filename-hash':
 					return FilenameHashLayout(*val[1:])
 		# fallback
 		return FlatLayout()
@@ -515,7 +583,7 @@ def get_mirror_url(mirror_url, filename, mysettings, cache_path=None):
 
 	# For some protocols, urlquote is required for correct behavior,
 	# and it must not be used for other protocols like rsync and sftp.
-	path = mirror_conf.get_best_supported_layout().get_path(filename)
+	path = mirror_conf.get_best_supported_layout(filename=filename).get_path(filename)
 	if urlparse(mirror_url).scheme in ('ftp', 'http', 'https'):
 		path = urlquote(path)
 	return mirror_url + "/distfiles/" + path
@@ -722,15 +790,23 @@ def fetch(myuris, mysettings, listonly=0, fetchonly=0,
 	if hasattr(myuris, 'items'):
 		for myfile, uri_set in myuris.items():
 			for myuri in uri_set:
-				file_uri_tuples.append((myfile, myuri))
+				file_uri_tuples.append(
+					(DistfileName(myfile, digests=mydigests.get(myfile)), myuri)
+				)
 			if not uri_set:
-				file_uri_tuples.append((myfile, None))
+				file_uri_tuples.append(
+					(DistfileName(myfile, digests=mydigests.get(myfile)), None)
+				)
 	else:
 		for myuri in myuris:
 			if urlparse(myuri).scheme:
-				file_uri_tuples.append((os.path.basename(myuri), myuri))
+				file_uri_tuples.append(
+					(DistfileName(myfile, digests=mydigests.get(myfile)), myuri)
+				)
 			else:
-				file_uri_tuples.append((os.path.basename(myuri), None))
+				file_uri_tuples.append(
+					(DistfileName(myfile, digests=mydigests.get(myfile)), None)
+				)
 
 	filedict = OrderedDict()
 	primaryuri_dict = {}

diff --git a/lib/portage/tests/ebuild/test_fetch.py b/lib/portage/tests/ebuild/test_fetch.py
index c5ea8253b..b88ae3efb 100644
--- a/lib/portage/tests/ebuild/test_fetch.py
+++ b/lib/portage/tests/ebuild/test_fetch.py
@@ -7,7 +7,8 @@ import tempfile
 
 import portage
 from portage import shutil, os
-from portage.const import BASH_BINARY, PORTAGE_PYM_PATH
+from portage.checksum import checksum_str
+from portage.const import BASH_BINARY, MANIFEST2_HASH_DEFAULTS, PORTAGE_PYM_PATH
 from portage.tests import TestCase
 from portage.tests.resolver.ResolverPlayground import ResolverPlayground
 from portage.tests.util.test_socks5 import AsyncHTTPServer
@@ -18,8 +19,14 @@ from portage.util._async.SchedulerInterface import SchedulerInterface
 from portage.util._eventloop.global_event_loop import global_event_loop
 from portage.package.ebuild.config import config
 from portage.package.ebuild.digestgen import digestgen
-from portage.package.ebuild.fetch import (_download_suffix, fetch, FlatLayout,
-		FilenameHashLayout, MirrorLayoutConfig)
+from portage.package.ebuild.fetch import (
+	DistfileName,
+	_download_suffix,
+	fetch,
+	FilenameHashLayout,
+	FlatLayout,
+	MirrorLayoutConfig,
+)
 from _emerge.EbuildFetcher import EbuildFetcher
 from _emerge.Package import Package
 
@@ -142,9 +149,14 @@ class EbuildFetchTestCase(TestCase):
 				content["/distfiles/layout.conf"] = layout_data.encode("utf8")
 
 				for k, v in distfiles.items():
+					filename = DistfileName(
+						k,
+						digests=dict((algo, checksum_str(v, hashname=algo)) for algo in MANIFEST2_HASH_DEFAULTS),
+					)
+
 					# mirror path
 					for layout in layouts:
-						content["/distfiles/" + layout.get_path(k)] = v
+						content["/distfiles/" + layout.get_path(filename)] = v
 					# upstream path
 					content["/distfiles/{}.txt".format(k)] = v
 
@@ -499,6 +511,10 @@ class EbuildFetchTestCase(TestCase):
 				io.StringIO(conf))
 
 	def test_filename_hash_layout_get_filenames(self):
+		filename = DistfileName(
+			'foo-1.tar.gz',
+			digests=dict((algo, checksum_str(b'', hashname=algo)) for algo in MANIFEST2_HASH_DEFAULTS),
+		)
 		layouts = (
 			FlatLayout(),
 			FilenameHashLayout('SHA1', '4'),
@@ -506,7 +522,6 @@ class EbuildFetchTestCase(TestCase):
 			FilenameHashLayout('SHA1', '8:16'),
 			FilenameHashLayout('SHA1', '8:16:24'),
 		)
-		filename = 'foo-1.tar.gz'
 
 		for layout in layouts:
 			distdir = tempfile.mkdtemp()
@@ -520,6 +535,12 @@ class EbuildFetchTestCase(TestCase):
 				with open(path, 'wb') as f:
 					pass
 
-				self.assertEqual([filename], list(layout.get_filenames(distdir)))
+				file_list = list(layout.get_filenames(distdir))
+				self.assertTrue(len(file_list) > 0)
+				for filename_result in file_list:
+					if isinstance(filename_result, DistfileName):
+						self.assertTrue(filename_result.digests_equal(filename))
+					else:
+						self.assertEqual(filename_result, str(filename))
 			finally:
 				shutil.rmtree(distdir)


             reply	other threads:[~2021-02-22 12:18 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-22 12:18 Zac Medico [this message]
  -- strict thread matches above, loose matches on Subject: below --
2024-03-02  4:09 [gentoo-commits] proj/portage:master commit in: lib/portage/tests/ebuild/, lib/portage/package/ebuild/ Zac Medico
2021-02-22 12:18 Zac Medico
2019-10-20  8:34 Zac Medico

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1613994521.b9ef191c74982b0e8d837aa7dd256dc3c52f7d2c.zmedico@gentoo \
    --to=zmedico@gentoo.org \
    --cc=gentoo-commits@lists.gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox