From: Yinghai Lu on
Eric W. Biederman wrote:
> Yinghai Lu <yinghai(a)kernel.org> writes:
>
>> Eric W. Biederman wrote:
>>> Yinghai Lu <yinghai(a)kernel.org> writes:
>>>
>>>> Kenji Kaneshige wrote:
>>>>> Yinghai Lu wrote:
>>>>>> Yinghai Lu wrote:
>>>>>>> Kenji Kaneshige wrote:
>>>>>>>> I understand you need to touch I/O base/limit and Mem base/limit. But
>>>>>>>> I don't understand why you also need to update bridge's BARs. Could
>>>>>>>> you please explain a little more about it?
>>>>>>>>
>>>>>>>> Just in case, my terminology "bridge's BARs" is Base Address Register
>>>>>>>> 0 (offset 0x10) and Base Address Register 1 (offset 0x14) in the
>>>>>>>> (type 1) configuration space header of the bridge.
>>>>>>> i mean 0x1c, 0x20, 0x28
>>>>>>>
>>>>>>> did not notice that bridge device's 0x10, 0x14 are used...
>>>>>>> if port service need to use 0x10, 0x14, and the device is enabled, we
>>>>>>> should touch 0x10, and 0x14.
>>>>>> after check the code, if
>>>>>> pci_bridge_assign_resources ==> pdev_assign_resources_sorted ==>
>>>>>> pdev_sort_resources
>>>>>>
>>>>>> will not touch 0x10 and 0x14, if those resource is claimed by port
>>>>>> service.
>>>>>>
>>>>>> /* Sort resources by alignment */
>>>>>> void pdev_sort_resources(struct pci_dev *dev, struct resource_list *head)
>>>>>> { int i;
>>>>>> for (i = 0; i < PCI_NUM_RESOURCES; i++) {
>>>>>> struct resource *r;
>>>>>> struct resource_list *list, *tmp;
>>>>>> resource_size_t r_align;
>>>>>> r = &dev->resource[i];
>>>>>> if (r->flags &
>>>>>> IORESOURCE_PCI_FIXED)
>>>>>> continue;
>>>>>> if (!(r->flags) || r->parent)
>>>>>> continue;
>>>>>>
>>>>>> r->parent != NULL, will make it skip those two.
>>>>>>
>>>>>> So -v3 should be safe.
>>>>>>
>>>>> Thank you for the clarification.
>>>>>
>>>>> But I still don't understand the whole picture of your set of
>>>>> changes. Let me ask some questions.
>>>>>
>>>>> In my understanding of your set of changes, if there is a PCIe
>>>>> switch with some hot-plug slots and all of those slots are empty,
>>>>> I/O and Memory resources assigned by BIOS are all released at
>>>>> the boot time. For example, suppose the following case.
>>>>>
>>>>> bridge(A)
>>>>> |
>>>>> -----------------------
>>>>> | |
>>>>> bridge(B) bridge(C)
>>>>> | |
>>>>> slot(1) slot(2)
>>>>> (empty) (empty)
>>>>>
>>>>> bridge(A): P2P bridge for switch upstream port
>>>>> bridge(B): P2P bridge for switch downstream port
>>>>> bridge(C): P2P bridge for switch downstream port
>>>>>
>>>>> In the above example, I/O and Mem resource assigned to bridge(A),
>>>>> bridge(B) and bridge(C) are all released at the boot time. Correct?
>>>>>
>>>>> Then, when a adapter card is hot-added to slot(1), I/O and Mem
>>>>> resources enough for enabling the hot-added adapter card is assigned
>>>>> to bridge(A), bridge(B) and the adapter card. Correct?
>>>>>
>>>>> Then, when an another adpater card is hot-added to slot(2), we
>>>>> need to assign enough resource to bridge(C) and the new card.
>>>>> But bridge(A) doesn't have enough resource for bridge(C) and
>>>>> the new card. In addition, all bridge(A) and bridge(B) and the
>>>>> adapter card on slot(1) are already working. How do you assign
>>>>> resource to bridge(C) and the card on slot(2)?
>>>>>
>>>> thanks, will update the patches to only handle leaf bridge, and don't touch min_size etc.
>>> Tell me what is your expected behavior if I plug a bridge with hotplug
>>> slots into a leaf hotplug slot? Will you assign me enough resources so
>>> that I can plug in additional devices?
>> no.
>>
>> you need to plug device in those slots and then insert it into a leaf hotplug slot.
>
> Scenario.
>
> I insert a bridge with pci hotplug slots into a leaf hotplug slot.
> Which adds more leave hotplug slots.
>
> Since the bridge itself is no longer a leaf slot it's resources will not
> get reassigned.
>
> Then I will have no resources to assign to the leaves?

so we still have your min_size code there.

in your case:
you need plug all card in your slots on that daughter card at first, and then insert the daughter card to leaf slot in the MB.

my setup is :

system got 4 io chains. and will get slot:
00:03.0 00:05.0 00:07.0 00:09.0
40:03.0 40:05.0 40:07.0 40:09.0
80:03.0 80:05.0 80:07.0 80:09.0
c0:03.0 c0:05.0 c0:07.0 c0:09.0

those are hanged on peer root buses directly. but bios assign to them every one get 8M, if user plug one card need 256M, then it will not work.

with those two patches, could clear the resource assigned by BIOS, and get resource as needed. ( with mmio 64 bit )


YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on
Eric W. Biederman wrote:
> Yinghai Lu <yinghai(a)kernel.org> writes:
>
>> Eric W. Biederman wrote:
>>> Yinghai Lu <yinghai(a)kernel.org> writes:
>>>
>>>> Eric W. Biederman wrote:
>>>>> Yinghai Lu <yinghai(a)kernel.org> writes:
>>>>>
>>>>>> Kenji Kaneshige wrote:
>>>>>>> Yinghai Lu wrote:
>>>>>>>> Yinghai Lu wrote:
>>>>>>>>> Kenji Kaneshige wrote:
>>>>>>>>>> I understand you need to touch I/O base/limit and Mem base/limit. But
>>>>>>>>>> I don't understand why you also need to update bridge's BARs. Could
>>>>>>>>>> you please explain a little more about it?
>>>>>>>>>>
>>>>>>>>>> Just in case, my terminology "bridge's BARs" is Base Address Register
>>>>>>>>>> 0 (offset 0x10) and Base Address Register 1 (offset 0x14) in the
>>>>>>>>>> (type 1) configuration space header of the bridge.
>>>>>>>>> i mean 0x1c, 0x20, 0x28
>>>>>>>>>
>>>>>>>>> did not notice that bridge device's 0x10, 0x14 are used...
>>>>>>>>> if port service need to use 0x10, 0x14, and the device is enabled, we
>>>>>>>>> should touch 0x10, and 0x14.
>>>>>>>> after check the code, if
>>>>>>>> pci_bridge_assign_resources ==> pdev_assign_resources_sorted ==>
>>>>>>>> pdev_sort_resources
>>>>>>>>
>>>>>>>> will not touch 0x10 and 0x14, if those resource is claimed by port
>>>>>>>> service.
>>>>>>>>
>>>>>>>> /* Sort resources by alignment */
>>>>>>>> void pdev_sort_resources(struct pci_dev *dev, struct resource_list *head)
>>>>>>>> { int i;
>>>>>>>> for (i = 0; i < PCI_NUM_RESOURCES; i++) {
>>>>>>>> struct resource *r;
>>>>>>>> struct resource_list *list, *tmp;
>>>>>>>> resource_size_t r_align;
>>>>>>>> r = &dev->resource[i];
>>>>>>>> if (r->flags &
>>>>>>>> IORESOURCE_PCI_FIXED)
>>>>>>>> continue;
>>>>>>>> if (!(r->flags) || r->parent)
>>>>>>>> continue;
>>>>>>>>
>>>>>>>> r->parent != NULL, will make it skip those two.
>>>>>>>>
>>>>>>>> So -v3 should be safe.
>>>>>>>>
>>>>>>> Thank you for the clarification.
>>>>>>>
>>>>>>> But I still don't understand the whole picture of your set of
>>>>>>> changes. Let me ask some questions.
>>>>>>>
>>>>>>> In my understanding of your set of changes, if there is a PCIe
>>>>>>> switch with some hot-plug slots and all of those slots are empty,
>>>>>>> I/O and Memory resources assigned by BIOS are all released at
>>>>>>> the boot time. For example, suppose the following case.
>>>>>>>
>>>>>>> bridge(A)
>>>>>>> |
>>>>>>> -----------------------
>>>>>>> | |
>>>>>>> bridge(B) bridge(C)
>>>>>>> | |
>>>>>>> slot(1) slot(2)
>>>>>>> (empty) (empty)
>>>>>>>
>>>>>>> bridge(A): P2P bridge for switch upstream port
>>>>>>> bridge(B): P2P bridge for switch downstream port
>>>>>>> bridge(C): P2P bridge for switch downstream port
>>>>>>>
>>>>>>> In the above example, I/O and Mem resource assigned to bridge(A),
>>>>>>> bridge(B) and bridge(C) are all released at the boot time. Correct?
>>>>>>>
>>>>>>> Then, when a adapter card is hot-added to slot(1), I/O and Mem
>>>>>>> resources enough for enabling the hot-added adapter card is assigned
>>>>>>> to bridge(A), bridge(B) and the adapter card. Correct?
>>>>>>>
>>>>>>> Then, when an another adpater card is hot-added to slot(2), we
>>>>>>> need to assign enough resource to bridge(C) and the new card.
>>>>>>> But bridge(A) doesn't have enough resource for bridge(C) and
>>>>>>> the new card. In addition, all bridge(A) and bridge(B) and the
>>>>>>> adapter card on slot(1) are already working. How do you assign
>>>>>>> resource to bridge(C) and the card on slot(2)?
>>>>>>>
>>>>>> thanks, will update the patches to only handle leaf bridge, and don't touch min_size etc.
>>>>> Tell me what is your expected behavior if I plug a bridge with hotplug
>>>>> slots into a leaf hotplug slot? Will you assign me enough resources so
>>>>> that I can plug in additional devices?
>>>> no.
>>>>
>>>> you need to plug device in those slots and then insert it into a leaf hotplug slot.
>>> Scenario.
>>>
>>> I insert a bridge with pci hotplug slots into a leaf hotplug slot.
>>> Which adds more leave hotplug slots.
>>>
>>> Since the bridge itself is no longer a leaf slot it's resources will not
>>> get reassigned.
>>>
>>> Then I will have no resources to assign to the leaves?
>> so we still have your min_size code there.
>>
>> in your case: you need plug all card in your slots on that daughter
>> card at first, and then insert the daughter card to leaf slot in the
>> MB.
>
> Operationally that is an impossibility. I would not have multiple
> layers of hotplug if I only needed a single layer.
>
> Which means your patch would cause a regression in my setup.

ok, may need to compare new range size and old range size before clear it.

>
>> my setup is :
>>
>> system got 4 io chains. and will get slot:
>> 00:03.0 00:05.0 00:07.0 00:09.0
>> 40:03.0 40:05.0 40:07.0 40:09.0
>> 80:03.0 80:05.0 80:07.0 80:09.0
>> c0:03.0 c0:05.0 c0:07.0 c0:09.0
>>
>> those are hanged on peer root buses directly. but bios assign to
>> them every one get 8M, if user plug one card need 256M, then it will
>> not work.
>>
>> with those two patches, could clear the resource assigned by BIOS,
>> and get resource as needed. ( with mmio 64 bit )
>
> Hmm.
>
> Could you avoid reallocating resources until a pci device is plugged in
> that has problems?
>
> A lot of root bridges have important configuration registers that are
> not in standard locations. Which means in general we can not reprogram
> root bridges successfully from linux. At least not without code that
> knows the root bridge magic.
no one change that
>
> You can almost solve your problem by simply saying: pci=hpmemsize=256M.
> Which works except that allocating 4G of pci memory isn't very likely
> to work.
>
> One of the suggestions when I made my patch was to have a per port option
> instead of a global minimum. That is an option for your case. But it
> is not as elegant.
>
> The truly elegant approach is to make certain the hibernate in the
> drivers can handle bars being changed under them, hibernate everything
> that needs renumbering and then bring them back.
>
> Personally I think you should walk over to whomever did your firmware
> and tell them they goofed.

they said it IS Linux problem. because other os is ok.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on
Yinghai Lu wrote:

>>>
>>> Which means your patch would cause a regression in my setup.
>> ok, may need to compare new range size and old range size before clear it.
>
> after closing look up the code, it looks it will not break your setup.
>
> 1. before the patches:
> a. when master card is inserted, all bridge in that card will get assigned with min_size
> b. when new cards is inserted to those slots in master card, will get assigned in the bridge size.
>
> 2. after the patches: v5
> a. booted up, all leaf bridge mmio get clearred.
> b. when master card is inserted, all bridge in that card will get assigned with min_size, and master bridge will be sum of them
> c. when new cards is inserted to those slots in master card, will get assigned in the bridge size.
>
> can you check those two patches in your setup to verify it?
> http://patchwork.kernel.org/patch/56344/
> http://patchwork.kernel.org/patch/56343/

on top Jesse today's PCI tree.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on
Eric W. Biederman wrote:
> Yinghai Lu <yinghai(a)kernel.org> writes:
>> after closing look up the code, it looks it will not break your setup.
>>
>> 1. before the patches:
>> a. when master card is inserted, all bridge in that card will get assigned with min_size
>> b. when new cards is inserted to those slots in master card, will get assigned in the bridge size.
>>
>> 2. after the patches: v5
>> a. booted up, all leaf bridge mmio get clearred.
>> b. when master card is inserted, all bridge in that card will get assigned with min_size, and master bridge will be sum of them
>> c. when new cards is inserted to those slots in master card, will get assigned in the bridge size.
>>
>> can you check those two patches in your setup to verify it?
>
> I have a much simpler case I will break, as I tried something similar by accident.
which kernel version?
>
> AMD cpu MCP55 with one pcie port setup as hotplug.
> The system only has 2GB of RAM. So plenty of space for pcie devices.

one or two ht chains?

do you still have lspci -tv with it?

>
> If the firmware assigns nothing and linux at boot time assigns the pci mmio space:
> Reads from the bar of the hotplugged device work
> Writes to the bar of the hotplugged device, cause further writes to go to lala land.
>
> So I had to have the firmware make the assignment, because only it knows the
> details of the hidden AMD bar registers for each hypertransport chain etc.

that mean kernel doesn't get peer root bus res probed properly


YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Yinghai Lu on
Eric W. Biederman wrote:
> Yinghai Lu <yinghai(a)kernel.org> writes:
>
>> Eric W. Biederman wrote:
>>> Yinghai Lu <yinghai(a)kernel.org> writes:
>>>> after closing look up the code, it looks it will not break your setup.
>>>>
>>>> 1. before the patches:
>>>> a. when master card is inserted, all bridge in that card will get assigned with min_size
>>>> b. when new cards is inserted to those slots in master card, will get assigned in the bridge size.
>>>>
>>>> 2. after the patches: v5
>>>> a. booted up, all leaf bridge mmio get clearred.
>>>> b. when master card is inserted, all bridge in that card will get assigned with min_size, and master bridge will be sum of them
>>>> c. when new cards is inserted to those slots in master card, will get assigned in the bridge size.
>>>>
>>>> can you check those two patches in your setup to verify it?
>>> I have a much simpler case I will break, as I tried something similar by accident.
>> which kernel version?
>>> AMD cpu MCP55 with one pcie port setup as hotplug.
>>> The system only has 2GB of RAM. So plenty of space for pcie devices.
>> one or two ht chains?
>
> One chain.
>
>> do you still have lspci -tv with it?
>>
>>> If the firmware assigns nothing and linux at boot time assigns the pci mmio space:
>>> Reads from the bar of the hotplugged device work
>>> Writes to the bar of the hotplugged device, cause further writes to go to lala land.
>>>
>>> So I had to have the firmware make the assignment, because only it knows the
>>> details of the hidden AMD bar registers for each hypertransport chain etc.
>> that mean kernel doesn't get peer root bus res probed properly
>
> How do you do that without having drivers for the peer root bus?

we have amd_bus.c to handle amd k8 system with two chains. but one chain is skipped.
(wonder if need to reenable that for one chain k8 system)

another intel_bus.c is on the way to 2.6.33.

when use_crs is used, those info from pci conf space is not used but just print out for check if _CRS is right or not.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/